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We report on efforts to design the "CoUaborative Workstiop Physics" (CWP) instructionai strat- 
egy to deHver the first interactive engagement (IE) calculus-based physics course at Khalifa Univer- 
sity of Science, Technology and Research (KU), United Arab Emirates (UAE). To these authors' 
knowledge, this work reports the first calculus-based physics course on the Arabian Peninsula us- 
ing Physics Education Research (PER)-based instruction. A brief history and present context of 
general university and science/engineering teaching in the UAE is given. From this frame, a pre- 
reform baseline is presented for KU's calculus-based introductory mechanics course in its traditional, 
lecture(er)-centered form, as established by Force Concept Inventory (FCI) and Maryland Physics 
Expectations in Physics (MPEX) survey data, as well as course exam grades. The first semester 
of delivery in the prototype CWP modality has identified several key challenges for further re- 
forms; (1) second-language acquisition, (2) secondary and post-secondary academic preparation, 
and (3) culture-specific gender issues. We identify and characterize each of these issues quantita- 
tively through analysis of pre/post survey, course exam and standardized test data. We find that 
for students with high English language proficiency, normalized gain on FCI improves substantially, 
from (g) = 0.16±0.10 pre-reform to {g) — 0.47±0.08 in the CWP pilot (standard errors). However, 
we also find evidence that normalized gains on FCI are strongly modulated by language proficiency 
and discuss likely causes. Regardless of language ability, problem-solving skill is also substantially 
improved and course drop-fail- withdrawal (DFW) rates are cut from 50% to 24%. Open questions 
are identified and recommendations are made for future improvements, relevant to KU, universities 
in the broader Gulf region and other institutions in the developing world facing similar challenges 
involving secondary implementation of PER-based instructional strategies. 



I. INTRODUCTION 



The use of interactive-engagement (IE) instructional 
strategies and curriculum resources developed through 
physics education research (PER)i. in North America 
and Europe has produced improved student problem- 
solving performance and deeper conceptual understand- 
ing relative to lecture-centered instruction (e.g in intro- 
ductory mechanics^). More recently, increased attention 
has been given to the complications, and their mitiga- 
tion, arising during secondary implementations of PER- 
based curricula (e.g4) and to institutionalizing success- 
ful PER-based reforms. Specifically, evidence presented 
in Refs4i"— shows that the broader contexts in which 
an interactive-engagement course is implemented is, for 
the success and sustainability of the implementation, at 
least as important as how well PER-based learning tasks 
are executed in the classroom. These broader contexts 
can include the departmental, institutional, student and 
faculty idio-/ethno-cultural contexts'*. Several broad re- 
search questions are raised, given these demonstrations 
of the importance of context. Specifically, how far away 
from the context of the developing institution can a PER- 
based instructional strategy be implemented? If one of 



the broader contexts mentioned above is very different to 
that of the original developing institution, are there crite- 
ria on these contexts that can help faculty who are plan- 
ning a secondary implementation to predict possible risks 
for their reform project? Following from this, in terms 
of the implementation, how and to what extent can the 
original instructional strategy be changed in anticipation 
of these failure risks, so as to better match the contexts of 
the implementing institution, without compromising that 
instructional strategy's core functions? Perhaps most im- 
portantly, is there a generic change strategy that faculty 
groups can follow to help them achieve success and sus- 
tainability for their efforts across all of the relevant con- 
texts, especially departmental and institutional contexts? 

The present work makes contributions toward an- 
swering these questions, by reporting on a modified 
implementation of cooperative group problem-solving 
(CGPS)^"ia in a United Arab Emirates (UAE) context at 
Khalifa University of Science, Technology, and Research 
(KU hereafter). Motivated by Refs.^'^*, this work also 
presents a design-based approach for choosing and chang- 
ing the CGPS instructional strategy based on an analysis 
of the cultural expectations of its users (students), and 
presents a post-analysis of efficacy. Our long term vision 
at KU is to address major questions related to secondary 
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implementations for the UAE and the broader Arabian 
Peninsula/ Gulf Arab context. The first step toward this 
goal is to answer certain narrower, more concrete ques- 
tions for the KU/UAE context which are: 

1. On the equivalence of lecture- centered approaches: 
Is there a correspondence between features and ef- 
fects of traditional, lecture-centered instruction in 
the US and those of lecture-centered instruction in 
the UAE? Do lecture-centered approaches to in- 
troductory, calculus-based physics in the two soci- 
eties share the same features, in terms of classroom 
expectations and norms, instructional approaches 
and curriculum content, and the effects of the in- 
struction on conceptual learning, problem-solving 
skill and course drop-fail- withdrawal (DEW) rates? 

2. On the equivalence of IE approaches: Does use of 
the CGPS approach, thoughtfully modified, pro- 
duce improvements in student conceptual learning 
and problem-solving ability in KU students that 
are similar to comparable secondary implementa- 
tions presented in the PER literature? 

3. On identification and mitigation of failure risks 
for secondary implementations: Are there features 
of KU/UAE contexts that threaten successful and 
sustainable implementation of CGPS and how can 
these risks be mitigated? Are these features sim- 
ilar to those faced by secondary implementations 
conducted within the US or are there qualitatively 
different challenges? What changes to the CGPS 
approach are suggested by these differences and 
to what extent can an instructional strategy be 
adapted to them while maintaining its basic in- 
tegrity? 

To answer these questions, this work is structured as fol- 
lows. In Sec. HH a brief overview of the UAE and KU 
contexts will be given, starting with a historical perspec- 
tive and a summary of the present state of affairs, with 
emphasis on the role and perception of higher education 
in UAE society. The section focuses on a variety of mea- 
sures of student values and expectations, of learning in 
general and of physics in particular, prior to instruction, 
and includes a presentation and discussion of Maryland 
Physics Expectations Survey (MPEX)^- pre-test data. 
In Sec. mil a baseline performance analysis of pre-reform 
teaching in the introductory, calculus-based mechanics 
course is presented, including data taken with the Force 
Concept Inventory (FCI)^^ and the International English 
Language Testing Service (lELTS) test^** of English profi- 
ciency. In Sec. IIVI the baseline assessment of Sec. Illll and 
the broader contextual factors from Sec. |ll] are synthe- 
sized to create criteria using an engineering design-based 
approach that are then used to evaluate the potential 
efficacy of eight well-known and well-documented PER- 
based innovations. Using the same criteria, modifications 
to the chosen CGPS approach are motivated. In Sec. |Vl 
an analysis of the conceptual learning gains, class exam 



performance and at-risk student retention produced by 
the modified CGPS approach are presented. In Sec. IVIl 
we return to the three main research questions, as listed 
above, and discuss their answers in light of these results. 
We offer concluding remarks in Sec. IVIII on the efficacy 
of the reform, new questions raised by this work, and 
consequent directions for future research. 



II. UAE AND KU CONTEXTS 

In this section, we briefly review the major and rela- 
tively recent historical developments in the Gulf region, 
as they relate to education, for the benefit of the reader 
and to inform the methodology and discussion of this 
study. 



A. A Brief History of Education in the UAE 

Major political and economic changes in the MENA 
region often initiate or come in tandem with large-scale 
educational reforms^ (see RefsJ^iii for further review). 
Education in the lower Gulf coast of the Arabian Penin- 
sula is no exception and has undergone several rapid 
changes in recent history, first with the pearling industry 
boom of the late 19th century, with that industry's col- 
lapse during the Great Depression and the World Wars, 
and with the discovery of oil in the middle of the 20th 
century. Prior to the arrival of European colonial em- 
pires, the regional economy was mostly subsistence and 
did not permit the labor specialization necessary for for- 
mal education. Rather, in small, informal gatherings a 
mutawwa'a, a respected community elder who was of- 
ten the worship leader of the local mosque, would lead 
neighborhood boys in Islamic religious oral recitations 
and teach general wisdom for life. 

During the middle decades of the 19th century, the 
British Empire entered into trade and security agree- 
ments with the coastal sheikhdoms in an effort to se- 
cure sea lanes to India. These "Trucial States" brought 
peace and, combined with warm, shallow seas, a commer- 
cial pearling boom that in turn funded the first formal 
schools. Tragically, the truce agreements that brought 
prosperity also forbade Gulf Arab merchants from trad- 
ing pearls outside imperial markets. In the 1920's and 
30's, when Europe was hit by depression and war and 
Japan introduced cultivated pearls, the Gulf pearling in- 
dustry collapsed completely and formal schooling all but 
disappeared. The ensuing hardships and lack of access 
to education persisted even after the discovery of oil in 
the Trucial States territory in October, 1969. 

On December 2, 1971, following British withdrawal, 
the seven hereditary monarchies of Abu Dhabi, Ajman, 
Dubai, Fujairah, Ras al-Khaimah, Sharjah, and Umm 
al-Qaiwain declared their formation of the United Arab 
Emirates. Concurrently, the UAE Ministry of Education, 
along with many other federal ministries, were created to 
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FIG. 1. A schematic of some relevant US and UAE teaching and learning contexts, using the frames of context model of N. D. 
Finkelstein'' . Vertical arrows represent ways in which norms in one frame inform norms in their contained frames. Horizontal 
arrows represent ways in which norms in one society might be transferred to parallel frames in the other society. 



oversee a new public school system, using curricula im- 
ported mainly from Kuwait and Jordan and primarily 
teacher-driven, rote-learning methods, as textbooks and 
other resources were not widely available. This quickly 
changed as the oil crises of the 1970's, combined with 
rapid growth in worldwide oil use, lead to huge expan- 
sions in affluence and access to education for the region. 
Over the span of a few generations, the society rapidly 
transformed from one of about 80,000 Gulf Arabs, with a 
per capita income of 3K USD (2005 dollars) and an adult 
literacy rate of < 10%, to one that at present has nearly 
6 million people, with expatriate groups from 90 nations, 
a per capita income of 33K USD, and an adult literacy 
rate amongst citizens of ~80%. 

At present, there are 19 institutions of tertiary educa- 
tion in the Emirate of Abu Dhabi alone, including KU. 
A few salient features of the current higher education 
landscape are as follows^^. Combined, these institutions 
have a gross enrollment of about 25-30% of the adult cit- 
izen population, a factor of 5 increase over that of 1970 
and 75% of which are female students. The language 
of instruction in most settings is English. Consequently, 
while each institution has a distinct core mission, all that 
teach in English share a need to accommodate a majority 
of English Language Learners (ELLs), graduating from 
the mostly Arabic-based secondary schools. For these 
students, most institutions have a language-conditional 
admission category and a "foundation" or "preparatory" 
program (see a similar example in Ref*i2,), a year- long, 
intensive English and remedial math-and-science curricu- 
lum. As a result, the average time spent studying to 
obtain a bachelor's degree is 5.5 years. Another ongo- 
ing challenge, especially for STEM-focused programs, is 
the relatively small number of students following science 



and mathematics-intensive tracks in secondary school (< 
5%) and selecting to study STEM disciplines at univer- 
sity (< 30%). KU was established in 2008 by royal decree 
and by acquisition of Etisalat University College in Shar- 
jah, UAE which now forms its Sharjah campus facility. 
The Abu Dhabi campus opened its doors to degree pro- 
gram students in Fall 2009. Currently, the University is 
composed of the College of Engineering only which in- 
cludes the Department of Applied Mathematics and Sci- 
ences where its mathematics and natural sciences faculty 
are employed. All students, about 1000 enrolled total, 
are engineering majors who take two calculus-based in- 
troductory physics courses delivered by the department. 
Across the two campuses, these two courses currently 
serve 250-300 students per year, about 75% of whom are 
UAE citizens. 



Student expectations from national culture and 
transfer of norms across contexts 



Figure [T] shows a schematic representation of some US 
and UAE teaching and learning contexts that are relevant 
for the secondary implementation of PER-based instruc- 
tional strategies, created largely in the US, to their par- 
allel frames of context* in UAE society. Vertical arrows 
represent ways in which norms in one frame inform norms 
in their contained frames, as described by Finkelstein^i^. 
We add horizontal arrows to represent ways in which 
norms in one society might be transferred (e.g. situa- 
tional norms transferred via the classroom behavior of 
US trained faculty when teaching UAE students in the 
UAE) to parallel frames in the other society. For ease 
of reference, we typify the transfers by frame. Type I is 
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transfer of norms at the task level (e.g. introducing the 
expectation that students be able to solve novel or purely 
conceptual problems). Type II is transfer of situational 
norms (e.g. introducing the expectation that students 
interact with each other during class time). Type III is 
the transfer of idio-cultural norms (e.g. introducing the 
expectation that university "life" means high student au- 
tonomy and self- regulation). Type IV is the transfer of 
national/ethno-cultural norms (e.g. introducing the ex- 
pectation that one queue for a teller at the bank in a 
line) . 

Transfers of the Type IV kind are highly unlikely in 
the short term (and attempting them is often highly 
unproductive), but faculty awareness of the differences 
in expectations about university, between US national 
and UAE national culture, is important for student mo- 
tivational issues. A US professor's motivational speech 
usually goes something like 'study hard because you're 
paying for tuition and if you do well, you'll make more 
money when you graduate'— which is mostly mean- 
ingless in the UAE context. As a society where do- 
mestic tertiary education is relatively new, mostly im- 
ported, and entirely subsidized, UAE citizens have sig- 
nificantly different views about the role of higher edu- 
cation in their society as compared to US citizens and 
consequently UAE students attend university for differ- 
ent reasons. While Arab Gulf states primarily estab- 
lish universities to enlarge and diversify their private 
sector economies away from oil and reduce unemploy- 
ment, individuals and families in these nations view uni- 
versities as a means to gain social status, increase their 
marriageability, and secure public sector employments^. 
Essentially all Arab Gulf states offer their citizens ac- 
cess to tuition-free tertiary education and, unless they 
pursue other forms of employment, most graduates de- 
sire and have guaranteed employment with their gov- 
ernments. Amongst degree-holding Omani, Qatari, and 
Emirati citizens, 86%, 87%, and 85% are employed in the 
public sector, respectivelj^. This is in sharp contrast to 
US students, who largely attend university to broaden 
their career choices and increase their lifetime earning 
potential's. Aside from awareness, it's beyond the scope 
of this paper to discuss Type IV transfers further, so we 
focus on Types I - III for the rest of the text. 

We further narrow our scope by assuming that, at the 
task level, the average student is cognitively the same re- 
gardless of culture and that Type I transfers (of published 
PER-based tasks and task-level norms) must and should 
be successful without major modification. The key issue 
then is effective Type II and Type III transfers, those of 
the situational and idio-cultural norms critically linked 
to effective use of PER-based tasks. To maximize the 
likelihood of a successful and sustainable secondary im- 
plementation, this should be done with minimization of 
negative expectancy violation (for students and instruc- 
tors alike) as a key goal in the course design-^ s. There is 
evidence22ii^ that Gulf Arab university students' expec- 
tations for classroom norms may not be as incompatible 



with the idio-cultural values (e.g. student- student inter- 
actions, hands-on activities, equality in teacher- student 
interactions, sense-making over answer-making^'^) in- 
trinsic to successful IE pedagogies, contrary to what 
casual observers of UAE national/ethno-culture might 
imagine. 

In fact, there is an interesting juxtaposition present 
in Fig. [1] The seminal work on cultural values 
by G. Hofstede2^ would suggest that American cul- 
ture, when manifest in the educational situational/idio- 
cultural frames, carries expectations of /preferences for: 
student-teacher equality, student- centered education, stu- 
dents initiating communication^, open-ended learning 
situations, good discussions, tasks with uncertain out- 
comes that involve risk estimation and problem solving^^, 
grouping according to tasks, speaking out in class or in 
large groups, encouragement for showing individual ini- 
tiative, and emphasis learning 'how to learn—. These 
expectations all appear very much in-line with those of 
functional IE pedagogy^^, but Hofstede cautions that 
factors such as affluence, education, occupation, gender 
and age can significantly affect these expectations. In- 
deed, as one looks into narrower frames of context for 
the US, a large body of research on expectations for in- 
troductory physics courses shows these are in fact, not 
consistent with the expectations of American university 
students^i22i^. One could speculate at the cause. Per- 
haps because values implicit in most US public primary 
and secondary education are at odds with broader soci- 
etal and family values in the US^ and as a result, IE- 
compatible habits-of-mind (e.g. divergent thinking^) are 
devalued and decline with prolonged formal education'^'^ . 
Traditional instruction at the university level would then 
be expected by American students QiiS Oil result of prior 
experience^ rather than on culture, and continuing that 
instructional mode continues the trend away from expert- 
like attitudes toward the course contenti^. 

Conversely, the same considerations applied to Gulf 
Arab students suggests that as one looks into narrower 
frames of context on the right side of Fig. [1] these trends 
are reversed as compared to the US scenario . Hofstede's 
analysis of broader Gulf Arab culture suggests that in 
educational situational/idio-cultural frames, students ex- 
pect/prefer: student deference and dependence on teach- 
ers, teacher- centered education, teacher-initiated commu- 
nication'^^ , highly structured learning situations, teach- 
ers possessing and transferring absolute truths, tasks with 
sure outcomes that involve following instructions and no 
risk-^, grouping according to prior affiliations ( ethnic- 
ity, family, friendship, etc.), no speaking out in class or 
in large groups, discouragement for individual initiative, 
and emphasis on learning 'how to do'—. These expec- 
tations all appear at odds with IE classroom norms, but 
when Hofstede's survey instrument (VSM94; see Refi^) 
is given specifically to Gulf Arab university students in 
their classroom environments by their teachers (i.e. pre- 
cisely in the relevant frame of context), there are signif- 
icant modifications2S'.23,. J. Baron^^ surveyed 200 pre- 
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health sciences students at the University of Sharjah 
(UShj) and found that many student classroom expecta- 
tions (those associated with power distance^^ and uncer- 
tainty avoidance'^'') were similarly or better aligned with 
IE pedagogy than one would expect of general Ameri- 
can culture. In a survey of 219 foundation/preparatory 
program students at Qatar University (QU), K. Litvin 
and M. McAllister^^ similarly found that students ex- 
pected greater equality with their teachers and were more 
comfortable with uncertainty in learning tasks and envi- 
ronments than an analysis of the broader culture would 
suggest. 



C. Student idio-cultural expectations at KU 

Anecdotally. these authors find that many of the class- 
room behaviors exhibited by KU students are consistent 
with those reported by Litvin and McAllister—, espe- 
cially their observations on student reading habits, prepa- 
rations prior to classtime/lecture, and communication 
habits. In both institutions, students coming directly 
from high school have always had all materials (paper, 
pens, books, etc.) provided for them on a same-day basis 
and rarely realize that they need to bring or take notes 
during class, so these behaviors have to be taught and 
reinforced. Also, Gulf Arab culture has a strong oral tra- 
dition and reading for pleasure is quite rare, especially 
since their has never been a wide variety of books avail- 
able until very recently. As a result, students at QU and 
KU rarely read their course textbooks unless there are 
direct and regular grade incentives to do so. Further- 
more, while students generally do prefer interactive and 
group learning, they do still prefer to work with family, 
tribe, or close friends in class and on occasion, teaming 
a given student with certain other students can lead to 
tense situations, as was also observed at QU. 

Perhaps most importantly, similar to QU, these au- 
thors find that while KU students do expect student- 
centeredness and student-teacher equality in most re- 
gards, there are important ways in which the deference to 
authority, that is expected in the broader culture, man- 
ifests in the university setting, particularly around 'non- 
curricular' issues like grievances about grades, classroom 
management, assignment deadlines, etc. KU students 
will rarely confront a faculty member personally with a 
request for special treatment (e.g. permission to make 
up an assignment due to illness, absence for travel, etc.) 
or with a complaint (e.g. that pace of lecture is too fast, 
grading of a previous assignment unfair, etc.). Instead, 
students will communicate in ways so that the issue is 
presented as one belonging to a third party, which ap- 
pears to make such conversations more passive and more 
comfortable for them. For special requests, students will 
often send a friend to speak with the faculty on their be- 
half. The student's friend finds it easier to make a special 
request of the faculty since the request is 'not for me' and 
the student themself likely feels that they are being more 



submissive and respectful of the teacher's authority by 
not asking them 'to their face'. Even more importantly, 
for complaints, students may instead speak directly to 
the faculty's supervisor or other high-level administra- 
tor, again arranging the discussion about the issue so 
that it is about someone who is not personally present 
and thereby avoids direct confrontation and offense to 
the teacher. 

This behavior can have very serious effects on a course 
if not anticipated by all parties involved, especially since 
US trained faculty and administrators generally consider 
student complaints that 'go over a faculty's head' (such 
as to an ombudsman) as indicative of serious faculty 
abuse. If faculty and administrators are unfamiliar or 
misinformed about Gulf Arab culture, they may not en- 
tirely grasp what the reasonable expectations of students 
are in a given situation and misjudge the seriousness of 
a complaint. Students also soon realize that 'cultural 
offense', and the staff uncertainty and anxiety it pro- 
duces, can be a convenient "Trojan Horse" for getting 
any complaint to be taken seriously. Indeed, during the 
2009-2011 period, there were multiple examples of essen- 
tially instant changes to the management of courses that 
were prompted by relatively minor complaints, often to 
the surprise of faculty and students alike. This experi- 
ence made it clear to these authors that any pedagogical 
reform project would have to anticipate the nature of stu- 
dent expectancy violation and prepare project evaluators 
and administrators in advance, to expect complaints and 
to take care separating justifiable from unjustifiable ones. 
Likewise, to increase the likelihood of success, faculty 
would have to 'pick their battles' and avoid expectancy 
violation that is pedagogically unnecessary (e.g. issues 
of Type IV transfer of norms discussed above). 

There are also some significant differences between stu- 
dent expectations at UShj (as studied in Ref.— ) and QU 
(as studied in Refi^) and those at KU. The most impor- 
tant one is that the students are separated by gender at 
QU and UShj. and they are not separated at KU. Class- 
room geometry in KU classrooms typically has left-right 
symmetry and a large center isle, communicating uncon- 
sciously to students that the genders should sit on oppo- 
site sides which they invariably do. So, while the class- 
room spaces are co-educational, this is not without some 
tension, and in a 2011 town hall discussion with students, 
many students expressed a desire that learning activi- 
ties be further separated along gender lines. Specifically, 
some students (from both genders) felt more comfort- 
able addressing faculty when members of the opposite sex 
were not present and that they could better learn subject- 
matter content in single-gender settings, with gender in- 
tegration used only for team-based projects^. Conse- 
quently, teaming male and female students in groups dur- 
ing class must be done very thoughtfully, if done at all. 
It is very likely that doing so may not produce learning 
gains that are worth the discomfort produced, or worse, 
that learning gains could be undermined by lack of stu- 
dent engagement caused by their discomfort. 
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FIG. 2. MPEX pre-test scores. 



Students' task-level expectations for learning 
physics content 



In this section, we are concerned with students expec- 
tations of physics learning itself, at the narrowest frame, 
the task-level context. To gauge students in this regard, 
we have surveyed two groups, one from academic year 
2010 and one from the pilot CWP group in 2011, us- 
ing the Maryland Physics Expectation (MPEX) survey 
Figure [5] shows the overall score on the MPEX, adminis- 
tered pre-instruction, and displayed in an agree-disagree 
(A-D) plot as done by Reddish et al. (1998). Overall, 
both year groups test similarly (errors in the mean are 
only a few percent, smaller than the icons used in the 
plot). Thus, there is some confidence that differences be- 
tween traditionally taught students (2010) and students 
in the CWP pilot (2011) are not caused by differences 
in expectations. Note also that compared to the MPEX 
calibration group and student groups surveyed in Red- 
dish et a/.'s original publication ("US Tertiary" in the 
figure), KU students respond significantly less favorably 
on the MPEX pre-test. In fact, KU pre-test scores on 
MPEX are consistent with US post-instruction scores as 
found by Reddish et al. (1998). A marker for a random 
response is added for benchmarking purposes and adds 
some confidence that students are answering thoughtfully 
(Cronbach's alpha for the MPEX overall is 0.79). 

The clusters on MPEX that arguably have the most 
bearing on classroom norms at the task-level are the "In- 
dependence" and "Effort" clusters (questions 1, 8, 13, 
14, 17, 27 and questions 3, 6, 7, 24, 31 respectively). 
This is because both clusters were constructed and val- 
idated for measuring expectations about student behav- 
iors. As stated in Ref.^^, the independence cluster mea- 
sures whether or not 'learning physics' "means receiving 
information or involves an active process of reconstruct- 
ing one's own understanding" and for the effort cluster, 
"whether [students] expect to think carefully and eval- 
uate what they are doing based on available materials 



and feedback or not". KU student responses on these 
two clusters also show the most dramatic differences from 
their overall response as shown in Fig. [5J 

Figure [3] shows the A-D plot for the independence clus- 
ter. Clearly, KU students respond very unfavorably to 
some items in this cluster. The most unfavorable re- 
sponses are to items ^1 and #14, as was the case in the 
original MPEX study^^, however the degree of unfavora- 
bility is significantly greater. Item #1 states: 

All I need to do to understand most of the 
basic ideas in this course is just read the text, 
work most of the problems, and/or pay close 
attention in class. 

Only 8% of students on average (9% for year group 2010, 
7% for 2011) disagreed with this statement (the favor- 
able response). No student strongly disagreed. On aver- 
age 77% of students agreed with item #1 (76% for KU 
2010 and 78% for KU 2011), meaning on average only 
15% of students responded neutrally. A similar pattern 
is present in responses to item #14 which states: 

Learning physics is a matter of acquiring 
knowledge that is specifically located in the 
laws, principles, and equations given in class 
and/or in the textbook. 

Only 17% of students on average (18% for KU 2010, 14% 
for KU 2011) disagreed with this statement (the favorable 
response). Again, no student strongly disagreed. On av- 
erage 56% of students agreed with item #14 (59% for KU 
2010, 50% for KU 2011), meaning on average 27% of stu- 
dents responded neutrally. On the remaining items in the 
independence cluster, students on average responded fa- 
vorably 39% versus 35% unfavorably and 26% neutrally, 
still slightly lower but somewhat more consistent with 
US tertiary pre-test scores presented in Ref.-i^. (Note: 
Cronbach's alpha score for the independence cluster is 
0.58). 

This stands somewhat in contrast to cultural values 
data on similar students discussed in the previous section 
and helps to triangulate on some of the anecdotal remarks 
about KU students that were added there. Students do 
expect student- centeredness and student-teacher equality, 
but this appears to be mainly when they are engaging 
the instructor of the course. When engaging the content 
of the course, clearly, based on the text of item #1 and 
#14, KU students exhibit a particularly high dependence 
on perceived authoritative sources of information (text, 
problems, or 'in class' [instructor]) as compared to US 
students on average. 

As shown in Fig. |4l the effort cluster from the MPEX 
data presents a more favorable, though not entirely clear 
picture. Clearly, KU students respond favorably to most 
items in this cluster, much more like their US tertiary 
counterparts prior to instruction. The notable and some- 
what confusing exceptions are items #6 and #24 which 
received on average 48% and 35% favorable responses, 
respectively. Item #6 states: 
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FIG. 3. MPEX pre-test scores for the independence cluster. 



FIG. 4. MPEX pre-test scores for the effort cluster. 



/ spend a lot of time figuring out and under- 
standing at least some of the derivations or 
proofs given either in class or in the text. 

A 48% agreement (the favorable response) indicates that 
either proofs themselves, or the contexts in which they 
are given (in-class or the text) is not deemed an impor- 
tant behavior for learning physics. Yet, item ^7 states: 

/ read the text in detail and work through 
many of the examples given there. 

The level of agreement (the favorable response) is 70% 
on average. Taken together, it would appear that KU 
students believe that engagement with the course text- 
book is important for learning physics, but it is most 
valuable as a source of worked examples, not as a source 
of proofs and derivations. This is somewhat in contra- 
diction to the instructors' anecdotal evidence, discussed 
above, that many students do not read the course text at 
all. It might be the case however, that the discrepancy is 
the result of the wordiirg of item ^6 which contains many 
compound constructions ("figuring out and understand- 
ing", "derivations or proofs", "in class or in the text"), 
that ELL students may not be sure what the statement 
is asking them to reflect upon. (Note: In support of this 
possibility, the Cronbach's alpha score for the effort clus- 
ter is 0.60, but if item #6 is removed, it improves the 
most for any 1-item removal, to 0.70). 

Item 7^24 is also responded to relatively unfavorably, 
with only 35% agreeing. It states: 

The results of an exam don't give me any use- 
ful guidance to improve my understanding of 
the course material. All the learning asso- 
ciated with an exam is in the studying I do 
before it takes place. 

This response is strangely at odds with student responses 
to item #31: 



/ use the mistakes I make on homework and 
on exam problems as clues to what I need to 
do to understand the material better. 

to which 83% of students agree, the most favorable of any 
in the effort cluster. Speculating, this discrepancy could 
be the result of the long tradition in UAE public schools 
to follow a British-style model for testing, where learn- 
ing is assessed by a single, high-stakes exam at the end 
of the course (often carrying 60% or more of the course's 
grade weight), as opposed to the model iir typical US 
physics courses which feature 2-3 midterm exams and 
weekly graded homework spread throughout the course. 
In the prior case, with the course completed, there would 
be little reason for a student to expect to need to study 
their mistakes on an exam siirce the course is finished. 
Confirirring this however, requires a more specific investi- 
gation. Nevertheless, the fact that 83% of students agree 
to the statement of item #31 and expect to study their 
mistakes on homework assignments in order to under- 
stand the course material is encouraging. 



E. Summary Discussion of Context and 
Expectations 

Student expectations associated with classroom norms 
at the various levels of context have been assessed. Data 
on UAE nationals' cultural values and how these in- 
form expectations in educational environments suggest 
that essential classroom norms for effective PER-based 
instruction will not be totally rejected. On the contrary, 
at the ethno/irational cultural level, there is some ev- 
idence that course reform might be better received by 
students at KU than among students at US institutions. 
At the idio-cultural level, despite the beirefits reported 
in the literature on cooperative, mixed-gender, female- 
majority teams (e.g. Ref,-), it seems unlikely in the KU 
coirtext that the student learning gains amongst fresh- 



8 




3 



O 

ro 



0.15 

0.1 

0.05 


0.2 

0.15 

0.1 

0.05 


0.2 

0.15 

0.1 

0.05 




Force Concept Inventory 

a.) All Students 

Traditional Instruction 

B Pre-test 

I I Post-test 



b.) Direct Admission 
Traditional Instruction 



lilikui 





c.) Conditional Admission 
Traditional Instruction 



n n n in 




d.) Conditional Admission 
CWP Instruction 



3 6 9 12 15 18 21 24 27 30 
No. Correct 

FIG. 5. Histogram of pre- (black) and post-test (white) Force 
Concept Inventory data, a.) FCI data from all tradition- 
ally taught students, b.) Data from all traditionally taught, 
directly admitted students, c.) Data from all traditionally 
taught, conditionally admitted students, d.) Data from all 
conditionally admitted students taught in the CWP pilot. 



man students will outweigh the costs, in terms of nega- 
tive expectancy violation, and the consequent potential 
for administrative course intervention caused by assign- 
ing students to such teams. At the task-level on the other 
hand, the strong evidence that KU students rely heavily 
on their instructors' authority and expect passive mem- 
orization of equations and worked examples from text- 
books to produce satisfactory learning are student ex- 
pectations that are all worth violating in the reformed 
course's design. 



III. BENCHMARKING TRADITIONAL 
INSTRUCTION 



KU's original model for science and engineering courses 
with labs is a so-called "3+1" inclusive model, with 3 
credit hours per week devoted to lectures (and therefore 
3 contact hours of lecture) and 1 credit hour devoted 
to a laboratory session which typically meets once per 
week for 2.5 to 3 contact hours to create a 4-credit hour 
course. The lecture and laboratory portion of the course 
form a single whole and students cannot register for one 
without the other. Laboratory grades contribute a per- 
centage toward the overall weight of coursework and are 
added to contributions from exams to calculate a sin- 
gle course grade. Lectures in the first-semester calculus- 
based physics course cover material typically included in 
introductory mechanics ; linear, circular, and rotational 
kinematics, forces and torques, Newton's Laws, work, en- 
ergy, conservation laws for energy, linear momentum, and 
angular momentum, statics and some basic topics in ther- 
mal physics. The original format of the course meets 
all of Hake's criteria^ for classification as a "traditional" 
course; didactic lectures to a passive student audience, 
"recipe" or "verification" labs^, and all formative and 
summative assessment with individual student exams 
featuring exclusively algorithmic problems (alternatively 
called end-of-chapter (e.g see'^^ and references therein) 
or "Halliday-Resnick"^ problems). Readers will notice 
the noteworthy absence of recitation or tutorial/problem- 
solving sessions. A typical semester would see a total of 
45 lectures, delivered either in three 50-minute session 
per week or two 75-minute sessions, for 15 weeks and ap- 
proximately 10 laboratory sessions spaced over the cal- 
endar to roughly follow the lecture and avoid weeks with 
holidays. As is often the case in many such courses in 
other universities, keeping the lecture and lab portions 
of the course synchronized is difficult and the topics be- 
ing treated in the laboratory could often be ahead or 
behind the lecture by as much as two weeks. 

There are also several features of the University's learn- 
ing environment that are unique, owing to its status as 
a recent start-up institution, that are worth mention. 
First, all students are granted full scholarship as part of 
their admission to the University. This is done to attract 
top talent and a large talent pool to aid in the start-up 
phase. Consequently, there are certain student traits, re- 
lated to socio-economic background and personal motiva- 
tion, that are difficult to compare with student bodies in 
many other universities. Second, as there is yet no dormi- 
tory facilities on the Abu Dhabi campus, most students 
are commuter students in one form or another. Access 
to campus is restricted in evening hours and there are 
no 24-hour facilities that are available to students. This 
necessarily means that some portion of their out-of-class 
studying occurs off campus which will be relevant when 
language issues are discussed below (see Sec. \V\ . 



9 



A. Pre-Reform Baseline Performance Assessment 

Starting from Fall 2009, Force Concept Inventory 
(FCI)^^ was administered to students both pre-/post- 
instruction, in English. This was repeated in two sub- 
sequent semesters, Spring 2010 and Fall 2010, where the 
traditional course was offered however, logistical issues 
prevented administration of the Spring 2010 FCI post- 
test. In FaU of 2010 and Spring 2011, student had 
a choice of taking FCI in Arabic or English but this 
had no significant effect on average scores, pre- or post- 
instruction, over Fall 2009 or Spring 2010. Figure[n]shows 
the distribution of all FCI data considered in this work. 
Table U shows class-averaged pre- and post-instruction 
FCI scores for all semesters that data have been gathered. 
A strict matching condition has been applied, such that 
any student that did not complete both pre- and post- 
tests is removed from the dataset prior to analysis. The 
uncertainties quoted for pre-test, and post-test scores are 
errors in the mean (y/a^/N). The uncertainties for the 
Hake's gain scores are propagated from the uncertainties 
for mean pre-test and mean post-test scores, where the 
Hake's gain scorc'^ itself is calculated in the usual way as 

(Post-test) — (Pre-test) 
(.9) Hake = 100 - (Pre-test) ' 

and where () symbols indicate class-averaged scores. 



B. Combining all data from traditional instruction 
on the AD Campus 

Given the start-up status of the university, class sizes 
for the Fall 2009 offering of the course were quite small 
and it was some time before enough data could be gath- 
ered to begin making sound decisions about potential re- 
form approaches. Near the end of Fall 2010, all data 
gathered at that time were culled together and analyzed 
to see if the data sets over the three traditional offer- 
ings (Fall 2009, Spring 2010, and FaU 2010) on the AD 
campus could be merged and analyzed as a single data 
set. With small sample sizes, it is difficult to confidently 
verify the normality of the data set, so we used a non- 
parametric test, the Wilcoxon (or Mann- Whitney) Rank 
Sum test^ (hereafter simply "rank-sum" test), to com- 
pare means and distributions. Comparing FCI pre-test 
scores for these three offerings, we find the following Z- 
statistics for the rank-sum test as applied pair-wise to the 
three data sets; Fall 2009 and Spring 2010, Z = -0.90 
and two-tailed p- value p - 0.367; FaU 2009 and FaU 2010, 
Z = —0.02 and two-tailed p-value p ~ 0.985; and Fall 
2010 and Spring 2010, Z = 1.18 and two-tailed p-value 
p ^ 0.239. Since all of the two-tailed p-values are much 
greater than 0.05, we therefore conclude there are no sig- 
nificant differences in the FCI pre-test scores for students 
in these three offerings. This comparison was repeated 
using a parametric test, the Welch's t-test, just as a check 



on this conclusion, even though it is difficult to verify 
normality in any of these individual data sets. The re- 
sults for the two-tailed p-values are similarly large and 
likewise suggest that there are no statistically significant 
differences between the three sets of FCI prc-tests. 



C. Pre-instruction evidence of a heterogeneous 
student population 

Pre-test data gathered from Fall 2009 until Spring 2011 
on the AD campus indicate that the student body is 
unusually heterogeneous and can be consistently decom- 
posed into two major groups based on their admissions 
type: (1) direct-entry students (those directly admit- 
ted to credit-bearing, degree-granting programs) and (2) 
preparatory-track students. The main criteria for de- 
termining along which path a student enters are their 
performance on an English language equivalency test 
(lELTSi-* most often or occasionally TOEFL IBT^^), 
their secondary school percentile, performance on a pre- 
admissions battery of tests (testing basic mathematics 
knowledge and computer literacy), and performance in 
an on-site interview. This two-component feature of our 
student body is important to characterize and monitor 
quantitatively since, as a start-up institution, teaching 
across all subjects and at all levels is in a constant state of 
experimentation and flux and without monitoring it can 
be very difficult to attribute cause for improved learning 
to particular pedagogical changes. In this section, we ini- 
tially focus on the data set from the Abu Dhabi campus, 
since the sample sizes are much larger, to answer some 
basic research questions. Later, we will justify merging 
the data sets, after checking if they are distributed simi- 
larly, and analyzing them jointly. 

Coincidentally, all CWP pilot students on both cam- 
puses were, as indicated in Tab. U admitted to the Uni- 
versity through the preparatory program however, it is 
important to determine how changes to preparatory pro- 
gram teaching (which includes conceptual and algebraic 
physics courses) during 2010 may have altered their pre- 
test performance compared to previous generations of 
preparatory-track students. Table [II] shows FCI pre-test 
scores, descriptive statistics, and rank-sum Z-statistics 
for comparison of direct-entry and prep populations and 
the CWP pilot students on the Abu Dhabi campus. Stu- 
dents repeating the course (and have therefore taken the 
FCI multiple times previously) are removed from the data 
set, however we do not enforce strict matching here (i.e. 
its not necessary that a student have an FCI post-test to 
be included here) since we are not calculating or com- 
paring gains. The rank-sum test is used to compare 
average scores on the various pre-assessments with the 
intent of determining if dissimilarities between the pop- 
ulations are statistically significant. The rank-sum test 
is an appropriate statistical test to use initially as it is 
a non-parametric test and can be used to analyze data 
sets that are discreet and have an unknown or indetermi- 
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TABLE I. Force Concept Inventory data spanning Fall 2009 until Spring 2011 semesters for KU's first-semester, calculus-based 
physics course. 



Semester (Campus^:Mode^ 


Population 


bize (iv ) 


Trr^T /TD^^ 

r \r^re-test / 


r (r^ost-test/ 


\y/Hake 




all 


73 


0.31 ±0.02 


0.39 ±0.03 


0.08 ±0.04 


Fall 2009 (AD:T) 


direct-entry 


24 


0.35 ± 0.03 


0.45 ±0.04 


0.15 ± 0.05 




prep-track 


49 


0.26 ±0.02 


0.30 ±0.02 


0.05 ±0.03 




all 


64 


0.33 ±0.02 


— 


— 


Spring 2010 (AD:T) 


direct-entry 


24 


0.40 ± 0.04 


— 


— 




prep-track 


40 


0.28 ±0.02 


— 


— 




all 


56 


0.30 ± 0.02 


0.36 ±0.03 


0.09 ±0.04 


Fall 2010 (AD:T) 


direct-entry 


16 


0.40 ± 0.04 


0.60 ± 5 


0.33 ± 0.06 




prep-track 


40 


0.27 ±0.01 


0.26 ± 1 


-0.01 ±0.02 




all 


28 


0.23 ± 0.02 


0.28 ±0.02 


0.06 ± 0.03 


Fall 2010 (Shj:T) 


direct-entry 


6 


0.16 ± 0.02 


0.24 ± 0.03 


0.10 ± 0.04 




prep-track 


22 


0.25 ±0.02 


0.30 ±0.02 


0.07 ±0.03 




all 


57 


0.26 ±0.01 


0.35 ±0.02 


0.12 ±0.03 


Spring 2011 (AD:CWP) 


direct-entry 













prep-track 


57 


0.26 ±0.01 


0.35 ±0.02 


0.12 ±0.03 




all 


21 


0.36 ±0.05 


0.45 ±0.05 


0.14 ±0.07 


Spring 2011 (Shj:CWP) 


direct-entry 













prep-track 


21 


0.36 ±0.05 


0.45 ±0.05 


0.14 ±0.07 



^ AD is for the Abu Dhabi campus, Shj is for the Sharjah campus. 
T is for Traditional and CWP is for Collaborative Workshop Physics. 



nate distribution (e.g. negative exponential, Gaussian, 
etc.)^. Since we have removed students repeating the 
course, we satisfy the necessary precondition for using 
the rank-sum test, namely that the compared data sets 
are independent. 

As shown in lower half of Tab. |lTl the average FCI pre- 
test score of direct-entry students is significantly different 
from that of preparatory students or of students in the 
CWP pilot. As an example, comparing the direct-entry 
students with preparatory students, we take the null hy- 
pothesis to be that their respective mean FCI pre-test 
scores are equal. As shown in Tab. HIl the test statistic 
Z for this comparison is 4.59 which has a corresponding 
two-tailed p-value of less than 0.00001. We can therefore 
confidently reject the null hypothesis. FCI pre-test scores 
for direct-entry students are 11% higher on average than 
those of preparatory students and the difference is statis- 
tically significant. We also find that FCI pre-test scores 
of direct-entry students are likewise higher and signifi- 
cantly different from those of our CWP pilot students 
{p < 0.001). When comparing preparatory students to 
the CWP pilot students however, we find that the null 
hypothesis, that their mean FCI pre-test scores are equal, 
cannot be confidently rejected {p ~ 0.66). We therefore 
can conclude that, despite changes to preparatory physics 
courses and pedagogy after Fall 2009 but prior to the 
CWP pilot, the FCI pre-test scores show no statistically 
significant differences. 

As stated above, English language equivalency is a ma- 
jor factor deciding whether a student is admitted directly 
to a degree program or admitted conditionally to the 
preparatory program. Also, one of the major changes 
to the preparatory program prior to the launch of the 
CWP pilot was that the required lELTS score for direct 



admission was raised from an lELTS overall score of 5.5 
to an overall score of 6.0. Therefore, as part of our char- 
acterization and monitoring of students' pre-instruction 
traits, we analyze lELTS overall scores to determine if 
there are significant sub-populations based on language 
ability. 

Table nil shows lELTS overall scores, descriptive statis- 
tics, and rank-sum Z-statistics for comparison of direct- 
entry and prep populations and the CWP pilot students. 
Like the above analysis of FCI pre-test scores, students 
repeating the course are removed from the data set to 
avoid double-counting their lELTS overall scores (un- 
like FCI, they do not retake lELTS when repeating the 
physics course) and invalidating the use of the rank- 
sum test which is again used to compare the different 
populations by admissions type. The main reason the 
rank-sum is the appropriate statistical test to use here is 
that lELTS scores are discreet and very coarse (coming 
in 0.5 increments over a range of 4.0 to 9.0) and ap- 
plying normality and parametric tests to small lELTS 
data sets is rather uninformative, as they often triv- 
ially pass such tests^S^. As shown in the lower half of 
Tab. ini the average lELTS overall score of direct-entry 
students is significantly different from that of prepara- 
tory students or of students in the CWP pilot {two- 
tailed p < 0.001). The difference between average lELTS 
scores for CWP pilot students and other preparatory stu- 
dents is not as pronounced but is still significant {p < 
0.001). Other preparatory students have more broadly 
distributed lELTS scores while CWP lELTS scores peak 
sharply at 6.0 (more than 50% of CWP students have 
lELTS 6.0). 

There is one last quantitative measure of the difference 
between the direct and preparatory admitted students 
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TABLE II. Some average pre-test scores for direct entry, prep-track and CWP pilot student populations on the Abu Dhabi 
campus. 



Descriptive statistics 



Population based on admission type 





All 


Direct 


Prep 


CWP pilot popxilcitioii 




iV 


234 


62 


102 


48 




Mean score (%) 


0.30 ± 0.01 


0.38 ± 0.02 


0.27 ± 0.01 


0.28 ± 0.02 




Min. score (%) 


U.Uo 


n 1 n 


U.Uo 


n 1 n 

U. lU 




Max. score (%) 


U.I 1 


U.I 1 


U.DO 


A dpi 


FCI Pre-test 


Variance (a^) 


1 oo 

loo 




11 / 


111 
111 






Direct 










rank-sum Z-statistic 


Prep 


4.59 




0.443 






CWP 


3.49 


0.44 






Mean score (%) 


6.0 


6.4 


5.7 


6.1 




Min. score (%) 


4.5 


5.0 


4.5 


5.0 


lELTS Overall^ 


Max. score (%) 


8.5 


8.5 


8.5 


7.5 






Direct 




5.13^ 


6.94^ 




rank-sum Z-statistic 


Prep 


5.13 




3.87^ 






CWP 


6.94 


3.87 





^ p< 0.001 
p< 0.001 
" p - 0.66 

d see Tab. |lIl]for a concordance with TOEFL IBT Scores 

p < 0.001 
f p < 0.001 
s p < 0.001 



TABLE III. Reproduction of Table 7 from Ref.*^: giving a 
concordance of TOEFL IBT and lELTS test scores. 



lELTS Score 


TOEFL Score 


9.0 


118-120 


8.5 


115-117 


8.0 


110-114 


7.5 


102-109 


7.0 


94-101 


6.5 


79-93 


6.0 


60-78 


5.5 


46-59 


5.0 


35-45 


4.5 


32-34 


0-4.0 


0-31 



and that is the university math placement test which is 
given to prospective students prior to joining the univer- 
sity. This test is a quantitative, 5-item multiple choice 
test which covers a variety of pre-calculus mathematics 
knowledge. The result is one of the criteria used to de- 
termine if a student should be admitted, and if so, di- 
rectly or into the preparatory program. Consequently, 
its expected to find a difference between preparatory and 
direct entry scores, especially since it is not used as an 
exit criteria for the preparatory program, as is the case 
with the lELTS test. Unfortunately, our data set for 
this score is limited however, based on those scores in 
our possession (about 1 /3 of the 234 entries described in 
Tab. ini, we find that the average score for direct entry 
students is about 20% higher than that of preparatory 
students {p < 0.01, two-tailed). This difference may also 
have limited predictive validity as well, since prepara- 



tory program students undergo up to a year of remedial 
mathematics instruction before leaving the preparatory 
program. Having a measure of prep students' mathe- 
matical ability upon exit from the preparatory program 
would be a much more useful and comparable metric, so 
as of the writing of this report (Fall semester 2011), these 
authors have partnered with university mathematics col- 
leagues to share data from a mathematics pre-test^ they 
are using for their own pedagogical reform project for 
freshman calculus courses. The analysis of this data and 
of its usefulness for the CWP project is ongoing and only 
in preliminary stages however, we see that a gap persists 
and that direct entry students {N — 26) still perform on 
average 16% higher {p < 0.01, two-tailed) than students 
exiting the preparatory program (A^ = 41). 

D. Combining data from the KU Sharjah Campus 

Beginning in Fall 2010, data of the kind presented 
above has also been gathered from the university's cam- 
pus in nearby Sharjah, UAE. We now ask of the combined 
data if there are any statistically significant differences in 
average pre-test scores. The above analysis for the larger 
AD campus data set gives good evidence that the respec- 
tive campus populations should first be broken down by 
admissions type before being compared (e.g. direct en- 
try students are compared to direct entry students across 
campuses, etc.). Also, we apply further statistical tests 
to determine if the AD data is normally (Gaussian) dis- 
tributed, so as to avoid more computationally-intensive, 
non-parametric tests, like the rank-sum test. 

Table HVl shows the results of tests to determine if the 
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TABLE IV. Normality and F-tests for AD campus pre-test data. 



Normality and F-test 


I 

All 


■^opulcitio 
Direct 


1 based o 
Prep 


n admission type 
CWP pilot population 


FCI Pre-test 


Variance 
A'^-test'' 


lOO.Z 

35.50'' 


ZDo.D 

4.34^= 


iio.y 
1.24^* 


10.78"= 


F-test p-values_^ 


Direct 
Prep 
CWP 


n nnm 
U.UUUi 

(0.0007) 


0.0001 
(0.7454) 


(0.0007) 

{f\ lA \ 


lELTS Overall^ 


Variance 
/C^-test 


0.6245 
17.11*=^ 


0.6682 
1.67'^ 


0.6626 
23.44' 


0.3113 
1.04j 


F-test p-values 


Direct 
Prep 
CWP 


(0.9640) 
0.0078 


(0.9640) 
(0.0064) 


0.0078 
(0.0064) 



^ specifically, the D'Agostino-Pearson Omnibus 
^ p< 0.001 

p ~ 0.11 

pr^ 0.54 
<= p < 0.005 
' two-tailed 
s p < 0.005 

p^ 0.43 
' p < 0.001 
j p - 0.60 

see Tab. |lIT]for a concordance with TOEFL IBT Scores 



AD campus pre-test data are normally (Gaussian) dis- 
tributed and if so, how their variances compare. The 
first test applied is the D'Agostino-Pearson Omnibus K^- 
test'^^ which is itself the combination of two tests, com- 
paring both the skewness (asymmetry) and the kurtosis 
(peaked-ness) of the data as compared to the normal dis- 
tribution. The null hypothesis is this case is that the data 
are indeed normally distributed. If true, the test statistic 
is chi-squared distributed with 2 degrees of freedom. 
Clearly, for FCI pre-test and lELTS data, the scores for 
all students together, regardless of population, are not 
normally distributed {p < O.OOland p < 0.005, respec- 
tively). This is expected and consistent with our finding 
above, that the mean performance of the sub-populations 
is distinct and the differences between the mean scores 
are statistically significant. 

When addressed separately, some assessment data for 
some of these sub-populations are consistent with nor- 
mally distributed data. FCI pre-test scores for direct 
entry students and preparatory students not involved 
in CWP are consistent with being normally distributed 
{p ^ 0.11 and p ^ 0.54, respectively), however pre-tests 
for CWP students are likely not {p < 0.005). For lELTS 
scores, direct and CWP data are likely to be normally 
distributed {p ^ 0.43 and p ^ 0.60, respectively), but 
those of other preparatory students are not {p < 0.001). 
To this last point however, we notice in the raw data 
for preparatory students that there are some lELTS and 
FCI scores that are reasonable candidates for rejection 
as outliers. 

We apply Chauvenet's criterion'"^ to investigate possi- 
ble outliers. lELTS scores for preparatory students show 
a 'second bump' at lELTS 7.5. Out of 105 such stu- 
dents, five have lELTS scores scattered above 7.0. To 



apply Chauvenet's criterion for the set {x} of N data we 
calculate n such that 

n — N X Prob (outside ta) , 

where 

, '^suspect 

a 

Xsuspect is the suspicious data point, and a is the standard 
deviation assuming the data is normally distributed. The 
parameter n = 2.7, 0.4, and 0.14 for lELTS scores 7.5, 
8.0, and 8.5 respectively. Whenever n < 0.5, there is 
reason to consider rejecting a data point from a set that 
one expects should be normally distributed. If we then 
reject both the 8.0 and 8.5 lELTS scores, the value 
for the preparatory lELTS data calculated in Tab. IIVI 
drops dramatically, from 23.44 to 15.87. If we further 
reject two of the lELTS 7.5 data points, consistent with 
Chauvenet's criterion, the value drops negligibly. If 
we drop all five data points with lELTS 7.5, the value 
falls to 4.14 (with corresponding p ~ 0.12). So, while 
rejecting these seven data is not strictly justifiable, doing 
so does show that some of the conclusions as represented 
in Tab. IIVI blindly calculated, are weaker than they may 
appear. 

We observe a larger sensitivity for the statistic for 
FCI pre-test scores for CWP pilot students. One stu- 
dent scored 3.2cr above the mean score (26%). Applying 
Chauvenet's criterion, we get for the parameter of merit 
n = 0.10, so the data point meets the rejection criteria. 
Doing so and recalculating the statistic produces a 
large change, from = 10.78 to — 4.55, which has 
a corresponding p- value of p ~ 0.1027. Therefore, the 
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null hypothesis, that the data are normally distributed, 
can no longer be confidently rejected. 

Therefore, given the overall small population sizes and 
the presence of arguable outliers which can dramatically 
weaken evidence of non-normality (expected in such an 
exotic student population) , we hereafter assume that FCI 
and lELTS data are normally distributed to a reason- 
able approximation. Furthermore, when comparing and 
combining Abu Dhabi and Sharjah campus data, we feel 
somewhat justified in using the simpler Welch's t-test, 
and not the more computationally taxing rank-sum, for 
comparing mean scores. 

Now, we return to our original question: are there any 
statistically significant differences between student scores 
on the Abu Dhabi and Sharjah campuses? Table|V]shows 
FCI pre-test, post-test, and Hake's gain scores for two 
course offerings where the FCI was administered on both 
Abu Dhabi and Sharjah campuses. Anticipating the pos- 
sible existence of a gender gap, we also compared scores 
for both genders. For pre-test and post-test data, all 
scores gathered are included, but for gain scores, strict 
matching is applied which is why population sizes for 
some gains is smaller than either pre- or post-test sizes. 
Symbols in the column headings, A'^, S, ct, and A, stand 
respectively for the population size, the average test score 
for that row. the standard deviation in those scores, and 
the difference between the average Abu Dhabi and av- 
erage Sharjah campus score. The last column shows p- 
values obtained as the result of a Welch's t-test, where 
the null hypothesis is that the mean test scores on the two 
campuses are equal. Clearly, there is no significant evi- 
dence that student FCI scores and gains differ for the two 
campuses. When the population is broken down into the 
categories identified in the above analysis (preparatory 
and direct entry), remaining differences lose all statis- 
tical significance, however the sample sizes are limited, 
so it is difficult to attribute cause. So, all that can be 
said at present is that there are little to no statistically 
significant differences between Abu Dhabi and Sharjah 
campuses, in terms of students or treatments (instruc- 
tion), as measured by FCI. Consequently, we hereafter 
merge both Fall 2010 data sets from the two campuses 
and both Spring 2011 data sets from the two campuses. 
We make a detailed comparison between traditional in- 
struction and our CWP treatment. When CWP scores 
and gains are mentioned below, we mean those from the 
combined Abu Dhabi and Sharjah data sets for Spring 
2011. 



E. Demographic Profile of Students 

To briefly summarize this section, we see compelling 
evidence that the university population is unusually het- 
erogeneous. There are substantial and statistically sig- 
nificant differences between two major sub-groups which 
are largely captured by categorizing them in terms of 
their admissions status: (1) direct admission, or (2) con- 



ditionally admitted via the preparatory program. Adding 
demographic data to the above analysis of test scores 
provides further insights. Table IVII gives a comparison 
of some of the salient demographic features of U.S. engi- 
neering students and KU students. Combining this data 
with that of the analysis above shows clearly that the KU 
student population is unique, certainly different from the 
typical population for which interactive engagement tech- 
niques were originally developed (i.e. students of fresh- 
man, calculus-based physics, mostly engineering or pre- 
engineering majors in the US). 



IV. DESIGNING THE COLLABORATION 
WORKSHOP 

In this section, the important features of the initial in- 
ternal proposal and the design process that was followed 
when creating the course are reviewed. At the time of 
that proposal's writing (summer of 2010), our group at 
KU elected to take an approach that approximated engi- 
neering design. The overarching goal of the design was to 
provide 'proof-of-principle', to show that interactive en- 
gagement teaching could be adapted to a UAE context 
and that by changing pedagogy alone, greater student 
learning is possible. This began with crafting a problem 
statement using the then-existing baseline data presented 
in Sec. IIIIl Objectives, constraints, and significant risks 
for any possible alternative pedagogy were then added to 
this problem statement, allowing for boundaries of the 
'solution space' to be defined. Last, the group enumer- 
ated available assets, such as institutional resources, fac- 
ulty and staff skills and experience, and student traits, in 
order to explore how each could be used in converging on 
a plausible pedagogical alternative that exploited those 
assets while remaining within the constraints. The con- 
cept and discussion of frames of context by Finkelstein 
and Pollock in Ref.** proved a very helpful perspective 
for parsing diverse stakeholder values and maintaining 
focus only on those values associated with Type II and 
Type III transfers of norms and avoiding tempting Type 
IV transfers (see Sec. |ll|. The cultural values and ex- 
pectations shared by the majority of students, that have 
their origin in the larger cultural context of UAE soci- 
ety, would have to be incorporated and capitalized on 
in the design of a successful alternative pedagogy. The 
impetus for making this approach work and to succeed 
in reforming the physics instruction is obvious from Tab. 
IVII Conditionally admitted students, who are predom- 
inantly UAE nationals, make up 70% of the university 
student body and show alarmingly little response to tra- 
ditional lecture instruction, to the extent that reforming 
pedagogy became a moral imperative. 

What follows below are student learning objectives 
(L), boundary constraints (B), cultural objectives (C), 
and risks (R) identified by the our group. Each are listed 
with a brief discussion of a values/expectations-based jus- 
tification. A reformed instructional strategy with reason- 
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TABLE V. Comparison of mean FCI scores from Abu Dhabi and Sharjah Campuses. 



Semester Test 


Population 


»rAD 
i V 


iV 


b (a) 


b [a) 


A 1 

\^\ 


^-value^ 




All 


68 


29 


0.29 (0.12) 


0.23 (0.11) 


0.06 


0.0311 




Direct 


23 


6 


0.36 (0.15) 


0.16(0.06) 


0.20 


0.1191 


FCI Pre-test 


Prep 


40 


23 


0.27(0.08) 


0.25 (0.11) 


0.02 


0.4781 




Males 


34 


25 


0.31 (0.14) 


0.23 (0.12) 


0.09 


0.0474 




Females 


34 


4 


0.27(0.10) 


0.25 (0.04) 


0.02 


0.5472 




All 


64 


28 


0.36 (0.21) 


0.28 (0.09) 


0.08 


0.0354 




Direct 


22 


6 


0.57(0.22) 


0.24 (0.08) 


0.33 


0.1053 


Fall 2010 FCI Postest 


Prep 


40 


22 


0.26 (0.09) 


0.30 (0.10) 


0.04 


0.1905 




Males 


31 


24 


0.39 (0.23) 


0.29 (0.10) 


0.09 


0.2968 




Females 


33 


4 


0.34 (0.19) 


0.23 (0.06) 


0.12 


0.2216 




All 


56 


28 


0.10 (0.24) 


0.07(0.14) 


0.03 


0.4927 




Direct 


22 


6 


0.33 (0.26) 


0.09 (0.10) 


0.24 


0.1792 


Hake 


Prep 


40 


22 


-0.01 (0.12) 


0.07(0.15) 


0.08 


0.1295 




Males 


31 


24 


0.11 (0.27) 


0.09(15) 


0.02 


0.7979 




Females 


33 


4 


0.10(0.21) 


-0.03(0.07) 0.13 


0.2382 




All 


64 


21 


0.27 (0.11) 


0.36 (0.22) 


0.09 


0.3355 




Direct 








— 






— 


FCI Pre-test 


Prep 


48 


21 


0.28 (0.11) 


0.36 (0.22) 


0.08 


0.3612 




Males 


33 


12 


0.26 (0.10) 


0.39 (0.25) 


0.13 


0.3207 




Females 


31 


9 


0.29(0.11) 


0.31 (0.18) 


0.03 


0.7591 




All 


65 


21 


0.35 (0.17) 


0.45 (0.24) 


0.09 


0.3508 




Direct 








— 









Spring 2011 FCI Postest 


Prep 


48 


21 


0.38 (0.18) 


0.45 (0.24) 


0.07 


0.4527 




Males 


38 


12 


0.36 (0.16) 


0.52 (0.25) 


0.16 


0.2901 




Females 


27 


9 


0.35 (0.19) 


0.35 (0.20) 


0.01 


0.9480 




All 


57 


21 


0.11 (0.20) 


0.14 (0.32) 


0.03 


0.7953 




Direct 


















Prep 


48 


21 


0.14 (0.21) 


0.14 (0.32) 


0.00 


0.9973 




Males 


30 


12 


0.14 (0.19) 


0.21 (0.35) 


0.07 


0.6379 




Females 


27 


9 


0.08 (0.22) 


0.06 (0.27) 


0.02 


0.8460 



In each case, the null hypothesis associated with the listed p-value is that the average FCI scores are equal. 



TABLE VI. Comparison of demographic and learning gains data from traditional, first-semester, calculus-based physics courses 
for U.S. and KU engineering students. 

U.S. Engineering Students^ KU Direct Admission KU Conditional Admission 



19.5% women 
82% Caucasian or Asian 
13% under-represented minorities 

l-in-5 are 1st generation 
in their family to attend college 
l-in-9 not a U.S. citizen 
17% are ELL students 
8.0 lELTS is 'native' speaker 
(s)Hake~0.22±0.02i^ 
DFW rate:10-20%4^ 



70% women 
80% other MENA_^ 
20% African or Asian 
comparable to U.S. 

9- in- 10 not UAE citizens 
>90% are ELL students 
6.5 average lELTS 
(ff>Hake = 0.20 ±0.05 
DFW rate: 7% 



35% women 
90% Gulf Arab (UAE) 
10% African or Asian 
l-in-3 are 1st generation 
in their family to attend high school 
1-in-lO not UAE citizens 
100% are ELL students 
5.7 average lELTS 
(fl)Hake = 0.03 ±0.03 
DFW rate: 50% 



^ Middle East and A^orth Africa 



able prospects for successful implementation should: 



A. Student Learning Objectives 

1. LI Deliver the same content: The new pedagogical 
approach should deliver the same topical/subject 
matter content that was presented by the tradi- 
tional lecture version of the course. The reasons 



were two-fold: (1) to avoid the need to amend the 
ofRcial course syllabus with either the University- 
wide curriculum committee or the UAE Ministry 
of Higher Education and thereby delaying or jeop- 
ardizing the reform project, and (2) to limit the 
reform project's exposure to the criticism that 
substantial improvements in learning gains were 
achieved solely by covering less material and focus- 
ing on less concepts for longer periods of time. 
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2. L2 Reduce the course failure rate by half: To 
demonstrate value-added to the university, the new 
pedagogy should make a significant positive im- 
pact on the student retention, by lowering the then 
course DFW rate from ^40% (mostly accounted for 
by UAE national students) to 20% or less. 

3. L3 Increase student problem- solving ability on 'end- 
of-chapter' style problems: The new pedagogical 
approach should increase the ability of students to 
solve 'end-of-chapter' type problems by 10%, in or- 
der to bring exam performance up to a level con- 
sistent with achieving L2. The reasons for this re- 
quirement were two: (1) Universities in the UAE 
exercise a much larger level of oversight and qual- 
ity control on summative examinations, consistent 
with their British-style inspirations, such as with 
exam writing committees, external examiners (3rd 
party graders) and periodic course file reviews by 
the UAE Ministry of Higher Education. In most of 
these contexts, exam writing norms consistent with 
the traditional lecture method (though not neces- 
sarily consistent with British-styled curricula) are 
enshrined in policy. Therefore, it would be difficult 
for instructors to use exams with a large fraction of 
purely conceptual assessment tasks. And (2), the 
group wanted to similarly limit the reform project's 
exposure to criticism that increased problem solv- 
ing performance was achieved by giving students 
problems that are perceived by many in the com- 
munity as "easy, conceptual questions". 

4. L4 Provide hands-on laboratory experience: Most 
of our conditionally admitted students have had no 
exposure to experimental science in their secondary 
school curriculum. Furthermore, a well-designed 
kinesthetic experience is often most useful for ad- 
dressing common misconceptions in introductory 
mechanics. Therefore, it is deemed critically im- 
portant to have strong laboratory components to 
instruction vertically integrated through the entire 
college. First-year chemistry and physics courses 
must provide an effective cornerstone for students' 
development of laboratory skills. 

5. L5 Double conceptual learning gains: To line up 
with the metrics (in this case FCI learning gains) 
that were being used to form the basis of the pro- 
posal's problem statement and the motivation for 
the reform project, the new pedagogy must sub- 
stantially increase conceptual learning gains. 

6. L6 Increase student performance in core engineer- 
ing science courses: All of the teaching done by 
the Department of Applied Mathematics and Sci- 
ences is service teaching to the College of Engineer- 
ing, making the engineering departments primary 
stakeholders. Therefore, for the reform project 
to gain long-term credibility and sustainability, it 
should demonstrate that it produces students who 



are better equipped at navigating the engineering 
curriculum, in courses such as engineering statics, 
engineering dynamics, circuits, etc. 

B. Boundary Constraints 

1. Bl Not change the contact time model: As dis- 
cussed in Sec. IIIIl the contact model for the tra- 
ditional lecture-centered version of the course was 
"3+1" inclusive, meaning their are 3 credit hours 
available for lectures and 1 credit hour (3 con- 
tact hours) available for a laboratory and these 
two components are a single, 4 credit hour course. 
This contact model, which is enshrined in policy, 
is another example of an institutional norm that 
implicitly supports traditional lecture-centered in- 
struction. This is evident from the fact that most 
reform projects at other universities have changed 
the contact time model for pedagogical reasons (e.g. 
Ref4i). 

2. B2 Not require new teaching staff: Due to the 
start-up nature of the university and the recruit- 
ing challenges that established institutions in the 
UAE face, times scales for identification of new staff 
needs and initiation of hiring searches are typically 
twice as long as US universities. Therefore, any new 
pedagogical approach could not require additional 
teaching staff without facing significant delays. 

3. B3 Not purchase new laboratory or teaching tech- 
nology: Again, due to the start-up nature of the 
university and long turn-around times for procure- 
ment, purchasing and customs clearance in the 
UAE, any new pedagogical approach would have 
to function with existing laboratory and classroom 
equipment. The reform project was funded, but 
spending that funding and adding value through 
equipment purchases proved very difficult. 

C. Cultural Objectives 

1. CI Use group work based on effective single-gender 
teams: It was decided early in the course design 
process that mixed gender teams, while desirable 
in nearly every team-based, PER-based pedagogy, 
were an unnecessary risk in the KU context. Hav- 
ing teams of all male and teams of all female stu- 
dents working in the same classroom space was con- 
sidered to be an appropriate middle-step in a spec- 
trum of gradual gender integration across the stu- 
dents' university career. In high school, for UAE 
nationals, the genders are geographically separated 
and never see each other. In preparatory pro- 
gram instruction, they learn passively and alone in 
mostly traditional instructional settings, but with 
the opposite gender present in the same room. And 
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in upper-class courses, as engineering majors, they 
all work on capstone design projects in mixed gen- 
der teams. Furthermore, as shown in Sec. IIIH 
there is no evidence of a significant gender-gap 
in FCI performance (with a larger sample size, it 
is likely that female students slightly out-perform 
their male peers) but there is quite a large perfor- 
mance gap between direct entry (mostly expatriate 
Arabs) and prep-track (mostly UAE national) stu- 
dents. Student-to-student interactions are essen- 
tial for all IE pedagogies and the underlying hy- 
pothesis of teaming strategies, for those pedagogies 
that most heavily rely on extended group work, is 
that a diversity of ability levels and perspectives in 
the team members produces greater learning. In 
the KU context, if we are to assign students to 
teams and risk causing negative expectancy viola- 
tion (since most students will want to work with 
family or tribe members, or close friends) it would 
be more beneficial for their learning to ask them to 
work with same-gender members of other families, 
tribal or ethnic groups, than to ask them to work 
with the opposite gender. 

2. C2 Used mixed ethnicity teams: Conversely, as 
mentioned above, the most substantial performance 
gap on FCI and course exams is across the differ- 
ent admissions categories (and consequently, across 
ethnic backgrounds), not across genders. Negative 
sentiments should be minimized by making the pur- 
pose of the teaming as transparent as possible. Stu- 
dents must be made aware, both at the beginning 
of the course through an orientation, and repeat- 
edly throughout the course, of the teaming recipe 
used and why it is important for their learning. 

3. C3 Conduct and document frequent self- 
assessments: As mentioned in Sec. |lT] and as 
suggested above by our objective to team students 
with other family/tribal groups, the course design 
must include regular feedback and performance 
monitoring, in anticipation of KU student's 
demonstrated behaviors for special requests and 
complaints. Anonymous feedback surveys were 
seen by all as a way to gauge student discomfort 
levels early in the course and in a manner that 
would encourage them to express themselves more 
freely than they might in a face-to-face conversa- 
tion. Also, continuous feedback and performance 
reports would allow the teaching team to respond 
using data to any complaints passed from students 
through university administrators. 

D. Risks 

1. Rl Avoid high faculty skepticism: In the year that 
this project was proposed, the university held a se- 
ries of faculty and staff workshops centered around 



the theme of innovative education in engineering. 
Outside presenters at the workshops consistently 
criticized lecture-centered instruction and show- 
cased innovative approaches being developed and 
used at their home institutions in the US, Europe 
or the Far East. Anecdotally, there was clear evi- 
dence from questions and discussions at these work- 
shops, that most faculty were very skeptical of the 
effectiveness of departures from traditional lecture- 
centered instruction. The skepticism took on two 
distinct forms or schools-of-thought. Those ascrib- 
ing to the first generally considered the alternative 
pedagogies presented to be interesting, innovative 
and effective, but only for those students. These 
faculty were very skeptical, and for good reasons, 
that any such approaches would work for our stu- 
dents without major adaptation. Those of the sec- 
ond group hold an opinion more consistent with 
what physics education reforms in the West have 
addressed, namely, the belief that there is nothing 
fundamentally wrong with the lecture- centered ap- 
proach itself (e.g. Ref.^^). 

2. R2 Avoid high student anxiety: Interactive engage- 
ment approaches are known for raising student anx- 
iety levels. In the context of an individual task, 
the instructors' expectation of sense-making rather 
than answer-making calls for more abstract think- 
ing, strains working memory more, and, in the 
process of queuing and confronting preconceptions, 
produces cognitive dissonance and mental discom- 
fort more often than passively attending lectures 
or following a lab recipe. For KU students in par- 
ticular, this is compounded by additional stressors 
at the course level, since students have rarely if 
ever experienced a course that is similar to in- 
teractive engagement (as mentioned above, many 
students have not even experienced a traditional 
recipe lab in their primary and secondary school 
science courses) and are engaging course content in 
a second-language. The risks of high student anx- 
iety are serious. Students at KU have much more 
access and can provide one-on-one direct feedback 
to university administration about any issue. This 
can place significant administrative pressure on any 
course using an alternative pedagogy. 

3. R3 Avoid loose integration of interactive engage- 
ment activities: Low learning gain IE courses re- 
ported in the literature usually report in the post- 
analysis that IE activities were used in a way that 
did not inform the broader norms of the course 
(see e.g. Refs4^i^). Feedback to students for 
heads-on/hands-on activities were ill-timed or feed- 
back loops weren't completed at all and/or con- 
ceptual tasks were not represented on high-stakes 
exams or were given little weight in the overall as- 
sessment. This gives students the impression that 
IE activities were integrated ad-hoc. did not re- 
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fleet what the instructors 'really' expect of stu- 
dents, and were not a serious part of the course. 
In other words, the students believe the traditional 
instruction 'is the real instruction', and traditional 
learning (i.e. rote learning) 'is the real learning' 
that is expected, in spite of what the instructors 
are saying or doing. This risk was and remains 
very significant in the UAE context partly due to 
the exam writing/reviewing idio-culture mentioned 
above (see L3, Sec. IIV Ap . 

4. Avoid lack of IE teaching experience: None of 
these authors have advanced degrees in PER. Two 
of these authors (GWH and AFI) have some expe- 
rience (see Refs;-- and^^, respectively) using PER- 
based instructional strategies. 



E. Course design evaluation 

Nine different interactive engagement approaches were 
considered for possible full, partial, or hybrid implemen- 
tation. These were: 

1. Cooperative Group Problem Solving (CGPSp^— 

2. Just-in-time Teaching (JiTT)^ 

3. Modeling Instruction (MI)^ 

4. Peer Instruction (PI)^ 

5. Student- Centered Active Learning Environment for 
Undergraduate Programs (SCALE-UPp^ 

6. Socratic Dialog Inducing Laboratories (SDI Lab)^ 

7. Tutorials (TtlsPr^ 

8. Workshop Physics (WPp^ 

Table IVIII shows the results of our group's considera- 
tions for eight different published IE instructional strate- 
gies. Each of the goals listed above and the means to 
achieve that goal are given a row in the table. Across 
columns, references are given, where they could be found, 
of evidence for or against (X) that approach supply- 
ing the means specified by the desired goal and function. 
While entries are based on the references shown, the rel- 
evance and interpretation of the evidence presented is 
influenced by the subjective judgment of these authors 
and some of the decisions listed deserve some discussion. 



1. Cooperative Group Problem Solving 

As shown in Tab. IVIII the CGPS approach leads in 
positive evidence that it will match the needs of the KU 
context and scores relatively low on counts of negative 
and no evidence. Regarding positive evidence, CGPS 



requires no course content coverage reduction {LI), in- 
cludes its own lab curriculum {L4), is adaptable to a 
"3+1" contact time model {Bl), is given the designa- 
tion of being a "low-effort" methodology {B2), and fea- 
tures a 'traditional-looking' course structure all because 
it was created so by design. CGPS was created by start- 
ing with the traditional course structure and iteratively 
changing the function of the individual parts within that 
structure. As summarized by Heller & Heller, "...we have 
been developing a conservative model that conforms with 
the usual structure and focus of the large introductory 
physics course..." and "The Minnesota model is based on 
the familiar triad of lectures, laboratories, and recitation 
(discussion) section" (RefJ^, p. 4). The two instances of 
negative evidence for CGPS {B3 and R3) are due to, re- 
spectively: (1) the heavy use of video cameras^, equip- 
ment that KU does not have, in their problem-solving 
labs and (2) the design process followed by CGPS; start- 
ing with the traditional course structure, then modify- 
ing it iteratively. It's well-known that in the traditional, 
large-lecture physics course, a topic is treated in lab 
weeks apart from the lecture, and it was felt by these 
authors that this means "tight integration" {R3) is not 
accomplished by design if the traditional contact time 
model is followed. This does not mean lab and lecture 
are not synchronized in practice^ just that it is not an au- 
tomatic consequence of the course design. Perhaps most 
important, in terms of cultural expectations at KU, the 
CGPS method has already been studied with gender is- 
sues in teaming ( CI ) and efficacy with under-represented 
and at-risk student groups {C2) in mind (e.g. Refs^i^ 
respectively), and has evidenced positive improvements 
in retention and learning. 



2. Just-in-Time Teaching and Peer Instruction 

JiTT, and similarly Peer Instruction, have many posi- 
tive features that we anticipate would be quite beneficial 
if implemented in the KU context. However, there are 
a few facts that removed them from consideration. The 
foremost reason is that a learning management system 
(LMS; e.g. WebCT, Blackboard, Moodle, etc.) is es- 
sentially a prerequisite for implementing JiTT or PI, as 
the LMS is used to administer pre-class reading compre- 
hension quizzes^. At the time of our reformed course 
design exercise, KU did not have an LMS which is indi- 
cated as negative evidence for goal B3. Furthermore, a 
JiTT or PI implementation would still require adoption 
or creation of a separate laboratory curriculum {L4). For 
PI in particular, there is also often the need to reduce 
course content coverage (LI) to give time to in-lecture 
peer discussions^. KU has since implemented Moodle 
for its LMS, so it is very likely that JiTT. and specifically 
PI, will be considered in future efforts to more deeply re- 
form the lecture portion of the course. 
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Goal 



TABLE VII. Evaluation matrix for published IE pedagogies considered for the KU context. 

, , , , . , Published evidence that a method provides needed means 



CGPS 


JiTT 


MI 


PI 


SCALE-UP 


SDI Labs 


Tutorials 


WP 


LI 


Requires no coverage reduction 






✓■^ 


X^ 




X^ 


^^-^ 




L2 


Efficacy with URM & 'at-risk' students 




? 




? 






7 


7 


L3 


Increases trad, problem solving skill 












7 


7 


7 


L4 


Includes effective lab curriculum 




X^ 




x^ 






X 




L5 


Increases FCI (or FMCE) gains 














✓■^ 


✓■^ 


L6 


Attract /retain more engineering majors 


? 


? 










7 


7 


CI 
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5. Modeling Instruction, SCALE-UP, SDI Labs, and 
Workshop Physics 

Modeling Instruction, SCALE-UP, SDI Labs and 
Workshop Physics are certainly the most experimentally- 
focused and kinesthetically engaging of those PER-based 
instructional strategies considered here. The perfor- 
mance of most of them have also been extensively studied 
in secondary implementations across high schools, col- 
leges, and universities in the US. Consequently, these in- 
structional strategies present some of the most attractive 
options available in the PER literature for achieving the 
learning goals identified for the KU context {L1-L6). Un- 
fortunately, the radical changes to course structure that 
they respectively require are too institutionally risky at 
KU {B1-B3) to be attempted at present. Increased cred- 
ibility created by "early wins"~ from a less taxing first- 
reform project may change this situation in the future. 



4 ■ Tutorials 

Tutorials (Ttls) packages developed by University of 
Washington (UW)^^, University of Maryland (UM)^^ 



and Open Source (OS) versions^ were considered. The 
lack of a dedicated recitation hour in the existing KU 
course contact model meant any Ttls set would have to be 
implemented in a non-standard way which has been done 
previously in all three cases (hence satisfaction of Bl). 
However, this is most often done by using them as group 
activities during lecture instruction which puts satisfac- 
tion of Rl in question. So, while Ttls are not considered 
a stand-alone solution to the course reform design prob- 
lem, as a library of PER-based tasks they are very at- 
tractive for use. One author (AFI) has experienced^ with 
UW Ttls and there is more data on their performance, so 
while Ttls in general were selected for hybridization into 
the CWP session (see Sec. IIVI below). UW versions were 
used (and modified, mostly to simplify language use) ex- 
clusively. 



F. Collaborative Workshop Physics (CWP): 
Course Structure Prototype 

Following the above considerations, we converged on 
creation of a hybrid approach which we call 'Collabo- 
rative Workshop Physics' (CWP). The most important 
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Task Flowchart 




Pre-testing & 
tool training 
1 & 5 hours 
(x2 weeks) 



CWP Sessions 

3 hours 
{xl2 weeks) 



Post-testing 
1 hour 



T 



Course 
End 



I 



Tutorial 



T 



Experimental 
Problem 



T 



Context-Rich 
Problem 



Team 
Deliverables 



Timeline 

25 min. 
60-90 min. 

90-60 min. 



Task Features 



TUTORIAL 

1. Queue topic dejor misconception(s) 

2. Heavy individual coaching on related representation 

3. Team debate to reach consensus (deliverable) 

EXPERIMENTAL PROBLEM 

1. Concrete question, no solution procedure given 

2. Team debate w/moderate coaching to propose measurement 
procedure 

3. Kinesthetic experience of a simple system well-described by the 
representation Introduced by TUTORIAL 

4. Measurement techniques, error considerations 

5. Team debate to reach consensus on an evidence-based answer 
(deliverable) 

CONTEXT-RICH PROBLEM 
1. 
2. 



3. 



Abstraction of concepts now Introduced concretely 
Migration of representation used in solving a simple, concrete 
problem to solving a context-rich problem (which are realistic, I 
Ill-posed, under-defined, open-ended, etc.) 
Team debates to reach consensus solution (deliverable) 



DELIVERABLES (1 ea. per team representing consensus) 

1. 1 completed tutorial worksheet 

2. 1 page-only solution to experimental problem 

3. 1 page-only solution to context-rich problem 



FIG. 6. A flowchart representation, timeline, and description of tasks in the 3-hour Collaboration Workshop. 



constraint was that the lecture remain unchanged and 
so we focused on creating a reformed learning experience 
with the 3-hour lab period available each week. The prin- 
ciple components of the instructional strategy are taken 
from CGPS and various Tutorials however both were 
substantially modified to fit the context and design con- 
straints. Aside from showing evidence that CGPS could 
supply the means needed for the course reform, context- 
rich problems (the central feature of CGPS recitation in- 
struction) were seen very favorably because of their sim- 
ilarities with problems posed in innovative engineering 
design education. Specifically, design problems in engi- 
neering that are considered salient are often described in 
engineering education literature as being "realistic", "ill- 
posed", "ill-structured" or "open-ended" which are terms 
also used to describe context-rich problems in physics. 
Atypical of common solution strategies for traditional 
end-of-chapter or analysis problems, both kinds of prob- 
lems also require skills like; tolerance for uncertainty, es- 
timation, big-picture thinking, self-questioning for clari- 
fication, teamwork, and multiple representation use (see 
Ref.™., Sec. II "On Design Thinking" for example). It 
is reasonable to assume that both kinds of problems call 
upon similar cognitive resources and produce similar cog- 
nitive loads, though the goals for each kind of problem are 
different (creating a product vs. creating a prediction). 
This similarity is attractive for physics course design for 
creating a 'knock-on' effect that could indirectly benefit 
students in later engineering design courses and thereby 
positively contribute toward learning objective L6. 

The next most important constraint, and one of the 
few reasons all CGPS features present in the University 
of Minnesota model could not be implemented 'off-the- 



shelf was the lack of necessary lab equipment and the in- 
ability to retool {B3). However, this is easily mitigated, 
since the CGPS recitation sessions and CGPS laboratory 
sessions were originally designed to operate independent 
of each other^*'. Therefore, we chose to create an instruc- 
tional strategy that borrowed only recitation/problem- 
solving techniques from the CGPS model. That meant 
however, that goal L4, and possibly L5, could not be 
satisfied because there is no provision for a lab curricu- 
lum and limited opportunities in a CGPS recitation to 
queue mechanics misconceptions in a concrete, kines- 
thetic manner. To mitigate this, we took inspiration from 
the "box-of-probes" philosophy of Workshop Physics and 
the equipment already available on-site. From this per- 
spective, the simplest way to replace existing recipe labs 
with a reformed version was to narrow the goals of the 
experiments and remove their given procedures. 

We called these shortened labs experiment problems. 
They are given to the student teams in the form of a 
single, simple question and the main tasks are to reach 
a consensus on a measurement protocol, execute with 
the available measuring tools, and answer the question 
within 1-page only, through evidence-based reasoning 
with their measurement results and error analysis. The 
particular phenomenon investigated is chosen such that 
it is a concrete experience of a simple system sharing 
the same underlying physical principles involved in the 
context-rich group problem. By posing the experiment 
problem before the context-rich problem, students' me- 
chanical misconceptions are afforded an opportunity for 
concrete queuing, similar to the manner in which many 
such misconceptions are formed (kinesthetic experience 
during childhood sensory- motor development). Instruc- 
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tor coaching to teams, during their procedure design and 
when later solving the context-rich group problem, can 
then take on the form of short 'Socratic dialogues', meant 
to draw attention to, illicit reflection on, and provide tar- 
geted intervention against misconceptions. But it's ulti- 
mately left to peer discussions, enriched by such periodic 
coaching visits, to discover the correct interpretation or 
solution, and proceed once consensus is reached. 

Despite the intent on lowering the 'learning curve' for 
context-rich problems, the experiment problems alone 
were not deemed sufRcient for effective preparation. It 
was decided to conduct a 'warm-up', that featured heav- 
ily scaffolded training with drawing representations, to 
proceed the experiment problem. The representation 
featured in the 'warm-up' exercise would again be the 
same one most useful for thinking through the experi- 
ment problem and the context-rich problem. However, 
the exercise would be 'drill-like', ungraded, and used pri- 
marily to teach the student how to clearly draw that 
representation and establish correspondence between its 
features and the features of an example physical system. 

Figure [H] shows a flow chart representation of the Col- 
laboration Workshop instructional strategy. Prior to the 
first session, students meet in their lab sections twice; 
once to take a small battery of pre-instruction assess- 
ments (FCI and a mathematics test) and the second time 
to receive individual training with some basic tools (me- 
ter stick, vernier caliper, digital stop watch, digital scale, 
photogate/ light barrier with digital signal box, air track 
with glider carts). The following core features of the 
CGPS instructional strategy^^^, are followed thereafter, 
but with the following caveats: 

1. Formation of well-functioning cooperative groups. 
Students are assigned to teams matching the ideal 
composition reported Ref.S and conforming to con- 
text constraints (CI, C2). Each team is gender- 
homogeneous and composed of 4 students having a 
heterogeneous skill distribution (based on FCI pre- 
test, a pre-calculus math test, and lELTS score 
drawn from admissions records). Each team con- 
tains at least one member having a relatively high 
FCI pre-test score, one having a relatively high 
math pre-test score, one having a high lELTS score, 
and one who is an expatriate. Each member is given 
a rotating assignment to one of four different roles; 
Manager, Recorder, Skeptic, or Idea Generator. 
The role of Idea Generator is modified from that of 
Explainer—, in that they are coached to think of as 
many different solution strategies as they can for 
their team's problem, in addition to clarifying. As 
mentioned above, this is motivated by the desire to 
model similar habits of mind also found in effective 
engineering design education, where solving design 
problems is understood as a "divergent-convergent" 
questioning process^. Idea Generators are coached 
to personally introduce, and to illicit from others, 
as many solutions to the team's problem as pos- 
sible (i.e. divergent thinkingS^). Students in the 



remaining roles are coached in the standard way*. 
Skeptics are coached to ask probing questions in 
search of weaknesses in the physics behind ideas, 
with the goal of convincing the team to eliminate 
them from consideration (i.e. convergent thinking) . 
Managers are coached to facilitate team dynamics; 
keeping the team on task, organizing work into sub- 
tasks, and sequencing the team's work on the task 
in a logical order. Recorders are coached to check 
each member for their agreement on a plan of action 
and to only record what is reached by consensus. 
Roles are rotated after four weeks (four sessions) 
and students are reassigned to new teams based on 
available formative assessment data (CWP session 
average, in-class quiz and exam scores) in addition 
to pre-test assessment and language level. 

2. Repeated reinforcement to use a prescribed problem- 
solving strategy. Students are explicitly taught 
a problem-solving strategy that is essentially the 
same as that described in Ref though with a few 
minor innovations. The sequence taught is "read - 
imagine - draw - graph/diagram - chose theory - 
measure/calculate" and it is reinforced with each 
of the three components of the CWP sessions (tu- 
torial, experiment problem, context-rich problem). 
Reading is explicitly added and coaching attention 
is specifically devoted to it in the strategy since 
the nearly all KU students are ELLs (see Tab. 
IVip and often miss important information that is 
not represented numerically in the problem state- 
ment. The effect of this is not unlike that of other 
well-known student attitudes about problem solv- 
ing (e.g.^^), but the combination of both novice 
attitudes toward problem solving and the second- 
language learning environment mean 'plug-n-chug' 
solution strategies are very common and very diffi- 
cult to correct without persistent intervention. Lec- 
tures for the course, while largely unchanged in 
terms of content and format (goals & means LI, 
Bl, and Rl), are paced so that new concepts are 
introduced by the CWP session and reviewed and 
abstracted upon by the lecture. Example problems 
demonstrated during lecture model the same prob- 
lem solving approach. 

3. Using 'context-rich' problems as opposed to 'end- 
of-chapter' problems. This feature of the CGPS in- 
struction is implemented unchanged and some of 
the problems in the CGPS manualiS were used, 
though sometimes modified for context (i.e. chang- 
ing regional terms like "state trooper" to the more 
generic "police officer", etc.). Other problems used 
were designed following the recipe in Ref.^''. 

4. Individual accountability via positive interdepen- 
dence via goal interdependence. This feature of 
the CGPS instructional strategy could not be im- 
plemented as recommended in Ref.^. There, the 
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ACTIVITY 1 . TUTORIAL (25 min) 

For two gliders before the collision in the figure below, assume that 
= 2mB, and compare the magnitude and direction of the following 
quantities: 

the net forces on the two gliders at an instant during the collision 
the changes in momentum of each of the gliders 
Also find the kinetic energy of the gliders before and after the collision 



m„, 2l/„ 



ACTIVITY 2. EXPERIMENT PROBLEMS (90 min) 

By placing two glider carts on the air track, we can study collisions between 
them. As before, light gates allow us to determine their velocities before and 
after they collide. Depending on the material at the point of contact (metal, 
rubber, clay, springs, magnets, etc.), the gliders can exert a wide variety of 
forces on each other. 

Lab question : Is there anything that does NOT change, no matter what 
material, no matter what force we choose? 

Lab problem : What is the change in the momentum of the two-glider system 
for a variety of forces present during their collision (each group should pick 
just one force, not the same as your neighbour)? 



ACTIVITY 3. CONTEXT-RICH PROBLEM (60 min) 

As an employee of SEMA (Save Earth Mission Agency), you are analyzing a 
collision between a space probe and an asteroid. Data log from the space 
probe tells you that the probe was moving with the speed Vp right before it 
collided with the slowly moving asteroid at the far edge of the Solar system. 
Estimating that the mass of the asteroid is about three times the mass of your 
space probe. You guess that something must have gone wrong with a 
guidance computer on board, but you also need to check which of the several 
possible SEMA-suggested scenarios for the dynamics of the space probe and 
the asteroid after the collision fits the situation most closely? 

i) v,, = 0, v,,= 1/3vp 

ii) Vs[=1/4Vp, Va,= 1/4Vp 

ill) Vg, = -Vp, Vg, = 2/3 Vp 

iv)v,, = -1/2Vp, v,|, = 1/2vp 



FIG. 7. An example sequence of activities in a CWP session, 
in this case, for instruction on linear momentum. 



authors compared two differeirt strategies they 
tested for creatiirg positive interdepeirdence within 
groups (goal interdependence and reward interde- 
pendence), with the objective of fostering mutual 
concern for individuals' success within groups and 
personal accountability to contribute toward the 
group effort. They found that reward interdepen- 
dence, created by adding a group problem solved in 
recitation to the score of iirdividual in-class exams, 
was superior in this regard. For logistical reasons 
and reasons explained in Sec. IIV Al for goal L3, we 
were unable to implement this and instead relied on 
goal interdependence, by requiring groups to reach 
consensus on major outcomes and report only the 
consensus on session deliverables. 



G. Collaborative Workshop Physics: Example 
Sequence of Session Activities 

Figure [7] illustrates an example set of activities used 
in the CWP sessions and helps to explain how they are 
chosen to form a coherent sequence. In this example, 
students are working through a variety of tasks revolving 
around linear momentum and conditions for its conser- 
vation. First, notice that in all the tasks, there is little or 



no numerical information given. This is done to reinforce 
explicit addition and attention to reading in the problem 
solving strategy. Students are instructed to keep their 
pens and pencils down for the first 5 minutes of each ac- 
tivity and to read only. During this time, the instructors 
make their first round of visits, asking students to simply 
answer questions like, "What is the big idea?"and "What 
does the writer of this problem want from you?", "What is 
the goal your team needs to reach?" Feedback and coach- 
ing is given on text analysis. For example, many students 
struggle with the multiple uses of the word "moment" and 
its apparent derivatives (i.e. "momentum"). Some often, 
quite reasonably, conclude that the word is used in refer- 
ence to 'time' (i.e. a very short time interval), rather than 
to torque or momentum. This first round of coaching 
allows instructors to engage in short discussions about 
context and how context in the problem statement can 
modify the meaning of jargonized words such as these. 
If necessary, the class will be stopped for a few words 
from the instructor if an issue appears to be common 
to all groups. One instructor recalls in their journal a 
5 minute discussion of the word 'hammer' for the case 
when it is used as a verb (i.e. 'to hammer a nail into 
wood') rather than as a noun to identify the tool. 

For the tutorial, coaching attention is more individual 
and direct, and emphasis is placed on drawing graphi- 
cal representations that effectively and compactly con- 
tain the information given in the text. Instructors visit 
teams a second time, just prior to finishing the tutorial, 
and this time ask questions about the sizes and directions 
of vectors, give small follow-up tasks, like drawing a cor- 
responding kinematic graph, and encourage members to 
compare and discuss their results. Every student in the 
group is given a tutorial sheet to work with, but an ad- 
ditional copy is given to the group's recorder to sketch a 
consensus drawing. 

Following the tutorial, a single sheet with the printed 
experiment problem is given to the Recorder in each 
group. Experiment problems were created in such a 
way that neither complicated measurements or lengthy 
data analysis were necessary for a satisfactory solution. 
Rather, a simple, concrete question is posed about ob- 
jects and/or motion presented/demonstrated to them, 
but with no procedure and no suggestions for specific 
equipment to be used for making measurments. Teams 
are required to create a simple procedure, justify it phys- 
ically and to submit a concise, written (one-page, front- 
only), evidence-based (measurements must be used in 
their argument) solution. As the task is handed out, spe- 
cial attention is again given to reading. Instructors follow 
a common coaching scheme for the experiment problems 
which is as follows: 

• On the first visit 

— Strong encouragement to read problem state- 
ment 

— Discuss, "What is being asked of us?" 
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— Generate a large number of ideas for possible 
solutions 

— Withold criticism of each other's ideas 

— Strong discouragement to touch any lab equip- 
ment or make measurements for first 20-30 
minutes of the problem 

• On a typical second visit (10 minutes after discus- 
sions begin) 

— Socratic questioning of the team, to gauge and 
to guide clear understanding of the problem 
statement 

— Strong encouragement to begin eliminating 
the weakest ideas for solutions 

— Strong encouragement to balance time spent 
converging on solutions versus building an ap- 
paratus and conducting measurements 

On third and later visits, instructors focused on ques- 
tioning the team about features of their chosen solution 
strategy (e.g. "Why did you choose this detector over 
another?) and what would happen if they made modif- 
cations (e.g. "What if we change the location of this pho- 
togate, what will happen to your graphs?") Early in the 
course, during the first 1-2 CWP sessions, there was also 
often a need to stop the whole class and present a few tips 
for effective team work, to warn groups against common 
pitfalls. The most common is the tendency to 'fall in love 
with the first idea' and converge (usually under duress 
from a perceived lack of time) too quickly on executing 
a sub-standard solution strategy. At the conclusion of 
the experiment problem, teams were often encouraged to 
take a 5-10 minute break. Upon return, one context-rich 
group problem is distributed to each Recorder. Again, 
teams are coached to not write anything for the first 10 
minutes or so, but rather to read, discuss, and answer 
amongst themselves, "What is the goal, what is the writer 
of this problem asking us for?" Instructors made a visit 
after this initial period to illicit reflection. 

V. RESULTS FROM THE CWP PILOT 

The results of the secondary implementation of PER- 
based teaching innovations described above are presented 
below as they relate to the learning objectives declared 
during the course design. The most directly measurable 
objectives are L2: reduce course failure rate, L3: increase 
'end- of- chapter' problem-solving ability, and L6: double 
conceptual learning gains. 

A. Reduced Course Failure Rate 

A coarse-grained but important measure of instruc- 
tional success is the course failure rate. As discussed 



TABLE VIII. Average performance of AD campus students 
on course exams with 'end-of-chapter' styled problems. 



Course Offering 


Mid-semester 




r mal 




N 


S'is.e. 


N 


S ± s.e. 


Fall 2009 (T) 


65 


0.61 ±0.03 


61 


0.55 + 0.03 


Fall 2010 (T) 


53 


0.61 ±0.02 


46 


0.57 ±0.02 


Spring 2011 (CWP) 


50 


0.77 ±0.03 


50 


0.70 ±0.02 


AS'2011-2009 ± s.e. 




+0.16 ±0.04 




±0.15 ±0.04 


AS'2011-2010 ± s.e. 




+0.16 ±0.04 




±0.13 ±0.03 



above and as shown in Tab. IVIl conditionally admit- 
ted students show an alarmingly high course failure rate 
(50%) under traditional instruction. One goal for the re- 
formed course, featuring the CWP instructional strategy, 
was to reduce this rate by half. The pilot offering was 
delivered exclusively to conditionally admitted students 
and as a consequence, was an ideal experiment to de- 
termine the efficacy of the new teaching approach. For 
students attempting the course for the first time, the av- 
erage fraction failing for all previous, traditionally taught 
offerings to all students has been 0.24 ± 0.06 (standard 
error of the mean, average is taken over course offerings), 
but for conditionally admitted students the fraction fail- 
ing has been 0.50 ± 0.10. Coincidentally, the failure rate 
for the reformed course is 24%, but since these are all 
conditionally admitted students, there is evidence that 
CWP instruction lowers the course failure rate by about 
half, from 50 to 24%. 



B. Improved traditional problem-solving skill 

TableEHshows exam scores from a selection of course 
offerings that have been analyzed to date. The major 
determiner for the course failure rate is student perfor- 
mance on summative course exams. One goal for the re- 
formed course was to increase the students' skill at solv- 
ing traditional 'end-of-chapter' style problems by 10% 
(L3), given problems of equal difficulty. This goal was 
motivated by and necessary to achieve the desired reduc- 
tion in course failure rate. Of course, it's non-trivial to 
establish the relative difficulty of a group of course exams, 
each with different problems, so that the performance on 
them can be compared. Our approach to solve this prob- 
lem was to query groups of professors and students unaf- 
filiated with the CWP project and ask them to perform a 
categorization and ranking task with the individual prob- 
lems. Problems from the midterm examinations, for all 
offerings from Fall 2009 to Spring 2011, were separately 
printed and randomly shuffied into a pile. Subjects of the 
survey were then asked to: 1) group the problems based 
on the perceived similarity of the task (literally "group to- 
gether problems based on similarity of solution", as done 
famously by Chi, Feltovich, and Glaser (1981)^^) and 2) 
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rank the problems in each group according to difficulty. 
This was repeated with problems from the final exams 
as well. Analysis of this data is still underway, but pre- 
liminary indications are that: 1) a problem from a CWP 
exam is present in every category introduced, and 2) a 
CWP problem is ranked as the most difficult in nearly 
all categories. This evidence supports the assertion that 
the mid-semester and final exams used to assess students 
in the reformed course are more difficult than the ex- 
ams in previous offerings. Consequently, directly com- 
paring raw CWP exam scores to scores from previous 
offerings should provide a valid and conservative lower 
bound for improved, traditional problem-solving perfor- 
mance. With these caveats, improvement in traditional 
problem-solving ability, as shown in Tab. IV B| is approx- 
imately 15% and likely more. 



C. Improved conceptual learning gains 
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Averaging over all traditionally taught ( T) offerings of 
the course, the normalized gain (or Hake's gain) of con- 
ditionally admitted students on the FCI prior to CWP 
instruction was {g)rj, ^^^^^ = 0.03 ± 0.03. The error here 
is the standard error of the mean. Figure [SJ panel c), 
shows a histogram representation of the corresponding 
pre-test (black) and post-test (white) FCI distributions. 
The normalized gain using CWP instruction, including 
both offerings on the Abu Dhabi and Sharjah campuses, 
was (5)cwPHako ~ 0.14 ± 0.04, a modest improvement. 
Nevertheless, Cohen's effect size ctfi. for the two distribu- 
tions of normalized gains is 

d = ^g^cwp - (g)T ^ Q Qj^ 

fCWP-l-T 

where ctcwf-i-t is the standard deviation of individual 
student gains of both data sets. The value d = 0.67 
is moderately high, evidencing that the benefit of CWP 
instruction to the average student is statistically signif- 
icant, enough to outweigh the relatively small sample 
sizes. Of course, there is also ample room for further 
improvement. 

VI. DISCUSSION 

In this section, we discuss implications of the results 
presented above, in terms of the evidence they provide 
for answering the major research questions of this work 
(see Sec. |I|. 

A. Is lecture-centered instruction in the US and in 

the UAE contexts equivalent, in terms of course 
structure, execution, and effect on student learning? 

The case presented in this work from KU provides 
compelling evidence that traditional, lecture-centered in- 



FIG. 8. Comparison of average final exam scores for stu- 
dents with a given overall lELTS score receiving either tradi- 
tional instruction or interactive engagement instruction with 
the prototype CWP method. 



struction is similarly structured and produces similarly 
poor student learning in both contexts. There are im- 
portant caveats however, that are worth mentioning. Di- 
rectly admitted students to KU engineering programs, 
while having similar learning gains to those of tradition- 
ally taught engineering students in the US {{g) = 20 ±5% 
vs.(g) ^ 22 ± 2% respectively), share very few demo- 
graphic traits in common (see Tab. IVI[) . Unlike US engi- 
neering majors, KU direct admission students are mostly 
female, mostly expatriates, and mostly ELLs. Given the 
growing body of significant research connecting issues of 
equity in instructional practice to gender and ethnicity 

g ^6i^62,70/9,80 -|^ l^ jg q^j^g possible that similarities in 
learning gains are not the result of similar causes. For in- 
stance, KU conditionally admitted students have a male- 
to-female student ratio that is closer to that of US engi- 
neering students and the normalized gain for traditional 
instruction is even lower, consistent with zero in fact. But 
it is not yet clear if the gender distribution is strongly cor- 
related with this difference. The difference in language 
level proficiency (as measured by lELTS) between the 
two groups is likely equal if not more important. Also, 
while there have been some recent studies on reformed 
physics teaching with ELLs, there are very few reports 
of situations where the language of instruction is also the 
language of the majority outside of the classroom (such 
as Ref.^^). This is also the case at KU, where the lan- 
guage of instruction is mostly used in isolation on campus 
and opportunities for authentic use and reinforcement 
off-campus are very limited. 

All of these issues warrant further research and of- 
fer potential for improved pedagogical innovations for 
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FIG. 9. Normalized Gain on FCI vs. IELTS overall score for 
conditionally admitted students. 



physics education in the UAE and other similar contexts. 
For the case of language, since we used language level 
proficiency in our teaming algorithm, we can draw some 
preliminary conclusioirs on the efficacy of this practice. 
The largest reduction in course failure rate is for stu- 
dents with IELTS overall scores of 5.5, none of whom 
failed the course. This group makes up 15% of the stu- 
dents in the pilot offering of the new course and, based 
on data from traditional instruction, half of them would 
be expected to fail. There are not yet enough statistics to 
make strong conclusions, but the fact that none of these 
students failed suggests a positive contribution toward 
achieving L2: efficacy with 'at-risk' students, recalling 
that IELTS 5.5 is the average score of conditionally ad- 
mitted students. 



B. Does interactive-engagement instruction in the 
US and UAE contexts require similar measures for 
implementation and produce similar learning gains? 

To answer this question, one must first determine if 
interactive-engagement teaching has been achieved. As 
discussed in Ref.^, faculty engaged in a new reform ef- 
fort often think that they are teaching in a manner 
consistent with findings in the PER literature when in 
fact the classroom norms they establish suggest other- 
wise. If one considers high normalized gain on pre- 
/post- instruction conceptual inventories as the 'signal' 
of interactive-engagement success, then the present case 
is not clear-cut. Hake^ reports that on average, success- 
fully implemented IE courses in introductory mechanics 
have {g) = 0.48 ± 0.14 while traditional courses have 
{g) = 0.23 ± 0.04. In the KU case, {g) for condition- 
ally admitted students improved from 0.03 ± 0.03 to 
0.14 ± 0.04, but both could still be categorized as tra- 



ditional based on using Hake's results as the definition of 
IE versus traditional. 

In spite of this, these authors assert that interactive- 
engagement teaching has been successfully implemented. 
Substantial care was taken to thoughtfully adapt the 
CGPS instructional strategy for recitations to the con- 
straints present for the KU first-semester introductory 
physics course and without compromising core features, 
including classroom norms, that are critical for reproduc- 
ing its published effectiveness. To support our argument, 
first we point out that improvements in gain of the size 
seen at KU are similar to other cases where only labo- 
ratory instruction was reformed and where no evidence 
exists to suggest that PER-based norms were absent from 
the lab setting. For example, in Ref.— , mean raw gain 
on FCI is similarly increased by about 9% as a result of 
reforming only the laboratory portion of the equivalent 
course. Similar results are reported by Hake?., for the 
cases of UL-RM95S-C and UL-RM95Su-C {{g) = 0.25 
and (g) — 0.26, respectively) where only the labs were 
reformed, improving {g) over the traditional course (UL- 
94F-C, {g) = 0.18) by 0.07-0.08. (see Ref.^s for details). 

We further support our argument by suggesting that 
for KU and similar institutions (having large ELL popu- 
lations), (g) averaged over the entire class may be too 
coarse-grained of a measure to adequately 'detect' in- 
teractive engagement teaching and the increased student 
conceptual learning resulting from it. We hypothesize the 
reason (g) is not a clear measure in our case is due to a 
strong connection between learning gains and proficiency 
in the language of instruction (English). We evidence this 
by extending on the above discussion of language. Figure 
[5] shows (g) for FCI versus IELTS overall score. Here we 
see a strong, positive correlation between IELTS score 
and (g), for students taught traditionally {R = 0.88) and 
in CWP {R = 0.82). But at very high language level 
(IELTS 7.0, about 10% of either population), the differ- 
ence between traditional and CWP instruction is large 
(0.16 ± 0.10 vs. 0.47 ± 0.08, respectively) and signifi- 
cant (p < 0.001 via rank-sum test). Considering that 
IELTS > 8.0 is considered 'mother-tongue' proficiency^!, 
one might roughly assume learning for these students to 
be largely free from second-language-related factors. If 
so, then it appears that traditional instruction to KU stu- 
dents with increasing English proficiency converges well 
to Hake's resuh {{g) = 0.16±0.10 and 0.23±0.04, respec- 
tively) in the limit of 'mother-tongue'-like proficiency. 
And the same convergence for CWP to Hake's IE result 
((g) = 0.47±0.08 and 0.48±0.14 respectively) appears in 
the same limit. All other major factors of the two groups 
are accounted for; both groups are conditionally admit- 
ted and have the same language ability. Both groups 
contain roughly equal numbers of students from all major 
demographic constituencies; males, females, expatriates 
and UAE nationals. Both groups have consistent expec- 
tations associated with learning physics course content. 
Thus, this increase in learning gains at IELTS 7.0, having 
controlled or accounted for other major factors, is strong 
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evidence that gains were predominantly caused by peda- 
gogical reform and that interactive engagement teaching 
was successfully implemented. 



C. Can failure risks for secondary implementations 
of PER-based instruction be predicted and 
mitigated, even in the presence of large cultural 
differences with the primary context? 

An analysis of qualitative data on classroom norms in 
the UAE, gathered from both these authors and from 
the literature concerning surrounding institutions, shows 
that indeed, there are certainly traits of the idio-cultural 
context for UAE universities that differ in kind to those 
encountered in the US. The role the university plays 
in the larger UAE society, and how that informs the 
motivations of perspective and current students, can 
have a strong influence on a student's response to PER- 
compatible classroom norms. This is largely determined 
by the broader ethno/national-cultural context and can- 
not be addressed by instruction, nor is it necessary to 
do so, given the manner in which the expectations of 
UAE national students are modified in narrower, class- 
room contexts. What we present is a proof-of-princple: 
one can generate from qualitative data on cultural val- 
ues, classroom behaviors, and institutional factors a set 
of culturally- and contextually-informed design require- 
ments for a successful PER-based course, evaluate the 
suitability of published PER-based pedagogies against 
these requirements, and converge on one that likely has 
the highest chances for success upon roll-out. Like any 
proof-of-principle, this work offers no guarantee that the 
same approach, tried under the same conditions would 
succeed however, it should silence suggestions that such 
an adaptation is impossible and doomed to failure. Fur- 
thermore, a design-based approach like the one presented 
in Sec. IIVI naturallv and logically suggests ways in which 
to thoughtfully modify a PER-based pedagogy while re- 
specting its core functions, in this case for CGPS. 



VII. SUMMARY AND OUTLOOK 

We find that PER-based instructional strategies, orig- 
inally developed in North America and Europe, can 
and should be adapted for secondary implementation in 
physics classrooms in the UAE. The modifications to a 
published instructional strategy that are needed to make 
a secondary implementation successful are not trivial, 
but neither are they impossible. We have presented a 
thoughtful analysis of the cultural values of our students 
and how they differ from those of students for whom 
PER-based instruction was originally designed. From 
this, we have extrapolated what values our students bring 
to university with them as freshmen, how they are mod- 
ified by classroom context, and have converged on re- 
quirements for a reformed course design, triangulating 



on these conclusions using our own observations of stu- 
dent behavior and those of other educators in neighbor- 
ing institutions. From this, we have followed an engi- 
neering design-based evaluation of existing PER instruc- 
tional strategies and identified CGPS as best exploiting 
our existing assets and providing the greatest number of 
necessary core functions for a reformed course in our con- 
text. The same evaluation also helped identify beneficial 
modifications of CGPS and additions from the Tutorials- 
based approach which lead us to focus our reform effort 
on the laboratory and to create the Collaboration Work- 
shop. We believe that institutions in the Gulf region and 
in the developing world in general, which are considering 
course reforms using PER-based tasks and instructional 
strategies, could benefit significantly from using this ap- 
proach as a reference. 

The need for such pedagogical reforms at KU was 
based on students' unsatisfactory course performance 
and conceptual learning, in courses that are essentially 
'traditional', as defined in the PER literature and that 
feature lecturing to passive audiences, algorithmic prob- 
lem exams and verification recipe labs. The poor concep- 
tual learning seen in North American studies for tradi- 
tional course formats is made worse when implemented 
for UAE national student populations, who are predom- 
inantly ELLs and who ascribe a very different societal 
role to university. A pilot set of two offerings of the Col- 
laboration Workshop approach, one on each of KU's two 
campuses, created to address these differences has pro- 
duced significant improvements in course DFW rate, tra- 
ditional problem-solving ability and conceptual learning 
gains. 

The Collaboration Workshop pilot has also produced 
valuable data that raises questions for future research, 
relevant for institutions similar to KU. Regarding is- 
sues related to pedagogy, the possible strong modulation 
of conceptual learning gains by language proficiency in 
the language of instruction requires further investigation. 
Distributing the high language level students was meant 
to help overcome language barriers between students and 
instructors and enrich peer interactions. Student teams 
for CWP were designed, whenever possible, to have at 
least one high English-language-level member, but these 
students appear to be the sole recipients of improved con- 
ceptual learning gains and not their other team members. 
Why is this the case? Also, distributing the high lan- 
guage level students is often combinatorically frustrated 
by the inability to create mixed-gender teams. Thus, an- 
other important direction for future research is to find 
other, similarly-effective, language-related coaching and 
support methods, for increased conceptual learning gains 
to students at lower language levels. 

Regarding issues related to assessment, despite giv- 
ing students their choice of FCI in English or Arabic, 
there is not yet enough evidence to confirm that the test 
has high validity with KU students. Are the normal- 
ized gains on FCI of lower language proficiency students 
hindered primarily by less conceptual learning, due to 



26 



language barriers faced during instruction, or is it due 
to these students' inability to comprehend the FCI ques- 
tions clearly? Many pre-test scores are alarming close to 
random. Are most students guessing? Anecdotally, the 
recent and rapid modernization of UAE nationals' society 
means that for many students Arabic and English liter- 
acy levels are often comparable and equally underdevel- 
oped. In other words, for written communication, there 
may be no 'mother-tongue' amongst many of our stu- 
dents. How does this impact the validity of the FCI and 
other conceptual instruments? If the effect is large, can 
and how should such an assessment instrument be mod- 
ifed to recover its validity? The Collaboration Workshop 
has also raised the on-campus visibility of pedagogical 
alternatives to traditional lecture. Though these authors 
present this work as a proof-of-principle, do other faculty 
and administrators view it as such? What are ways to 
extend upon 'early wins' produced in the pilot that will 
allow for reforms to the lecture sections and course ex- 



ams, which would presumably further improve student 
course performance and conceptual learning? 
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